LNCS 2031 




Tiziana Margaria WangYi (Eds.) 



Tools and Algorithms 
for the Construction 
and Analysis of Systems 

7th International Conference, TACAS 2001 
Held as Part of the Joint European Conferences 
on Theory and Practice of Software, ETAPS 2001 
Genova, Italy, April 2001, Proceedings 




S Springer 




Lecture Notes in Computer Science 203 1 

Edited by G. Goes, J. Hartmanis and J. van Leeuwen 




Berlin 

Heidelberg 

New York 

Barcelona 

Hong Kong 

London 

Milan 

Paris 

Singapore 

Tokyo 




Tiziana Margaria Wang Yi (Eds.) 



7th International Conference, TACAS 2001 
Held as Part of the Joint European Conferences 
on Theory and Practice of Software, ETAPS 2001 
Genova, Italy, April 2-6, 2001 
Proceedings 




Series Editors 



Gerhard Goos, Karlsruhe University, Germany 
Juris Hartmanis, Cornell University, NY, USA 
Jan van Leeuwen, Utrecht University, The Netherlands 

Volume Editors 
Tiziana Margaria 

Universitat Dortmund, Lehrs tuhl fiir Programmiersysteme 
Baroper Str. 301, 44221 Dortmund, Germany 
E-mail: tiziana@ls5.cs.uni-dortmund.de 

WangYi 

Uppsala University, Department of Information Technology 
Box 337, 751 05 Uppsala, Sweden 
E-mail: yi@docs.uu.se 

Cataloging-in-Publication Data applied for 

Die Deutsche Bibliothek - CIP-Einheitsaufnahme 

Tools and algorithms for the construction and analysis of systems : 

7th international conference ; proceedings / TACAS 2001, held as part 
of the Joint European Conferences on Theory and Practice of Software, 
ETAPS 2001, Genova, Italy, April 2-6, 2001. Tiziana Margaria ; Wang 
Yi (ed.). - Berlin ; Heidelberg ; New York ; Barcelona ; Hong Kong ; 
London ; Milan ; Paris ; Singapore ; Tokyo : Springer, 2001 
(Lecture notes in computer science ; Vol. 2031) 

ISBN 3-540-41865-2 



CR Subject Classification (1998): E.3, D.2.4, D.2.2, C.2.4, E.2.2 
ISSN 0302-9743 

ISBN 3-540-41865-2 Springer- Verlag Berlin Heidelberg New York 



This work is subject to copyright. All rights are reserved, whether the whole or part of the material is 
concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, 
reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication 
or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, 
in its current version, and permission for use must always be obtained from Springer- Verlag. Violations are 
liable for prosecution under the German Copyright Law. 

Springer- Verlag Berlin Heidelberg New York 
a member of BertelsmannSpringer Science-i-Business Media GmbH 

http ://www. springer, de 

© Springer-Verlag Berlin Heidelberg 2001 
Printed in Germany 

Typesetting: Camera-ready by author, data conversion by Claudia Herbers, Dortmund 
Printed on acid-free paper SPIN: 10782361 06/3142 5 4 3 2 1 0 




Foreword 



ETAPS 2001 was the fourth instance of the European Joint Conferences on 
Theory and Practice of Software. ETAPS is an annual federated conference that 
was established in 1998 by combining a number of existing and new conferences. 
This year it comprised five conferences (EOSSACS, EASE, ESOP, CC, TACAS), 
ten satellite workshops (CMCS, ETI Day, JOSES, LDTA, MMAABS, PEM, 
RelMiS, UNIGRA, WADT, WTUML), seven invited lectures, a debate, and ten 
tutorials. 

The events that comprise ETAPS address various aspects of the system de- 
velopment process, including specification, design, implementation, analysis, and 
improvement. The languages, methodologies, and tools which support these ac- 
tivities are all well within its scope. Different blends of theory and practice are 
represented, with an inclination towards theory with a practical motivation on 
one hand and soundly-based practice on the other. Many of the issues involved 
in software design apply to systems in general, including hardware systems, and 
the emphasis on software is not intended to be exclusive. 

ETAPS is a loose confederation in which each event retains its own identity, 
with a separate program committee and independent proceedings. Its format is 
open-ended, allowing it to grow and evolve as time goes by. Contributed talks 
and system demonstrations are in synchronized parallel sessions, with invited 
lectures in plenary sessions. Two of the invited lectures are reserved for “uni- 
fying” talks on topics of interest to the whole range of ETAPS attendees. The 
aim of cramming all this activity into a single one-week meeting is to create a 
strong magnet for academic and industrial researchers working on topics within 
its scope, giving them the opportunity to learn about research in related areas, 
and thereby to foster new and existing links between work in areas that were 
formerly addressed in separate meetings. 

ETAPS 2001 was hosted by the Dipartimento di Informatica e Scienze delPIn- 
formazione (DISI) of the Universita di Genova and was organized by the following 
team: 

Egidio Astesiano (General Chair) 

Eugenio Moggi (Organization Chair) 

Maura Cerioli (Satellite Events Chair) 

Gianna Reggio (Publicity Chair) 

Davide Ancona 
Giorgio Delzanno 
Maurizio Martelli 

with the assistance of Convention Bureau Genova. Tutorials were organized by 
Bernhard Rumpe (TU Miinchen). Overall planning for ETAPS conferences is the 
responsibility of the ETAPS Steering Committee, whose current membership is: 
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Foreword 



Egidio Astesiano (Genova), Ed Brinksma (Enschede), Pierpaolo Degano 
(Pisa), Hartmnt Ehrig (Berlin), Jose Eiadeiro (Lisbon), Marie-Clande 
Gandel (Paris), Snsanne Graf (Grenoble), Enrio Honsell (Udine), Nigel 
Horspool (Victoria), Heinrich HnBmann (Dresden), Panl Klint (Amster- 
dam), Daniel Le Metayer (Rennes), Tom Maibanm (London), Tiziana 
Margaria (Dortmnnd), Ugo Montanari (Pisa), Mogens Nielsen (Aarhns), 
Hanne Riis Nielson (Aarhns), Eernando Orejas (Barcelona), Andreas 
Podelski (Saarbriicken), David Sands (Gbteborg), Don Sannella (Edin- 
bnrgh), Perdita Stevens (Edinbnrgh), Jerzy Tinryn (Warsaw), David 
Watt (Glasgow), Herbert Weber (Berlin), Reinhard Wilhelm (Saar- 
briicken) 

ETAPS 2001 was organized in cooperation with 
the Association for Compnting Machinery 

the Enropean Association for Programming Langnages and Systems 
the Enropean Association of Software Science and Technology 
the Enropean Association for Theoretical Compnter Science 

and received generons sponsorship from: 

ELSAG 

Eondazione Cassa di Risparmio di Genova e Imperia 

INDAM - Grnppo Nazionale per FInformatica Matematica (GNIM) 

Marconi 

Microsoft Research 
Telecom Italia 
TXT e-solntions 
Universita di Genova 

I wonld like to express my sincere gratitnde to all of these people and organi- 
zations, the program committee chairs and PC members of the ETAPS confer- 
ences, the organizers of the satellite events, the speakers themselves, and finally 
Springer- Verlag for agreeing to pnblish the ETAPS proceedings. 
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Donald Sannella 
ETAPS Steering Committee chairman 




Preface 



This volume contains the proceedings of the 7th TAG AS, International Con- 
ference on Tools and Algorithms for the Construction and Analysis of Systems, 
TAG AS 2001 took place in Genova, Italy, April 2nd to 6th, 2001, as part of the 
4th European Joint Conference on Theory and Practice of Software (ETAPS), 
whose aims, organization, and history are detailed in the separate foreword by 
Donald Sannella. 

It is the goal of TACAS to bring together researchers and practitioners inter- 
ested in the development and application of tools and algorithms for specifica- 
tion, verification, analysis, and construction of software and hardware systems. In 
particular, it aims at creating an atmosphere that promotes a cross-fertilization 
of ideas between the different communities of theoreticians, tool builders, tool 
users, and system designers, in various specialized areas of computer science. In 
this respect, TACAS reflects the overall goal of ETAPS on a tool-oriented foot- 
ing. In fact, the scope of TACAS intersects with all the other ETAPS events, 
which address more traditional areas of interest. 

As a consequence, in addition to the standard criteria for acceptance, contri- 
butions have also been selected on the basis of their conceptual significance in 
the context of neighboring areas. This comprises the profile-driven comparison 
of various concepts and methods, their degree of support via interactive or fully 
automatic tools, and in particular case studies revealing the application profiles 
of the considered methods and tools. 

In order to emphasize the practical importance of tools, TACAS allows tool 
presentations to be submitted (and reviewed) on equal footing with traditional 
scientific papers, treating them as Trst class citizensk In practice, this entails 
their presentation in plenary conference sessions, and the integral inclusion of a 
tool report in the proceedings. The conference, of course, also included informal 
tool demonstrations, not announced in the official program. 

TACAS 2001 comprised 

— Invited Lectures by Moshe Vardi on Branching vs. Linear Time: Final 
Showdown and by Michael Eourman, Propositional Reasoning, as well 

as 

— Regular Sessions featuring 36 papers selected out of 125 submissions, rang- 
ing from foundational contributions to tool presentations including online 
demos, and 

— ETAPS Tool Demonstrations, featuring 3 short contributions selected 
out of 9 submissions. 

Grown itself out of a satellite meeting of TAPSOET in 1995, TACAS 2001 fea- 
tured the ETI DAY, a satellite event of ETAPS concerning the future develop- 
ment of the Electronic Tool Integration platform. 




VIII Preface 



TACAS 2001 was hosted by the University of Genova, and, being part of ETAPS, 
it shared the sponsoring and snpport described in Don SannellaV foreword. Like 
ETAPS, TACAS will take place in Grenoble next year. 

Warm thanks are dne to the program committee and to all the referees for 
their assistance in selecting the papers, to the TACAS Steering Committee, to 
Don Sannella for mastering the coordination of the whole ETAPS, to Egidio 
Astesiano, Gianna Reggio, and the whole team in Genova for their brilliant or- 
ganization. 

Recognition is dne to the technical snpport team: Matthias WeiB at the 
University of Dortmnnd together with Ben Lindner and Martin Karnsseit of 
METAErame Technologies provided invaluable assistance to all the involved peo- 
ple concerning the online service during the past three months. 

Einally, we are deeply indebted to Claudia Berbers for her first class support 
in the preparation of this volume. 
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Branching vs. Linear Time: Final Showdown 



Moshe Y. Vardi^ 

Rice University, Department of Computer Science, Houston, TX 77005-1892, USA 



Abstract. The discussion of the relative merits of linear- versus branching-time 
frameworks goes back to early 1980s. One of the beliefs dominating this dis- 
cussion has been that “while specifying is easier in LTL (linear-temporal logic), 
verification is easier for CTL (branching-temporal logic)”. Indeed, the restricted 
syntax of CTL limits its expressive power and many important behaviors (e.g., 
strong fairness) can not be specified in CTL. On the other hand, while model 
checking for CTL can be done in time that is linear in the size of the specifi- 
cation, it takes time that is exponential in the specification for LTL. Because of 
these arguments, and for historical reasons, the dominant temporal specification 
language in industrial use is CTL. 

In this paper we argue that in spite of the phenomenal success of CTL-based 
model checking, CTL suffers from several fundamental limitations as a specifi- 
cation language, all stemming from the fact that CTL is a branching -time for- 
malism: the language is unintuitive and hard to use, it does not lend itself to 
compositional reasoning, and it is fundamentally incompatible with semi-formal 
verification. These inherent limitations severely impede the functionality of CTL- 
based model checkers. In contrast, the linear-time framework is expressive and 
intuitive, supports compositional reasoning and semi-formal verification, and is 
amenable to combining enumerative and symbolic search methods. While we 
argue in favor of the linear-time framework, we also we argue that LTL is not ex- 
pressive enough, and discuss what would be the “ultimate” temporal specification 
language. 



1 Introduction 

As indicated in the National Technology Roadmap for Semiconductors \ the semicon- 
ductor industry faces a serious challenge: chip designers are finding it increasingly diffi- 
cult to keep up with the advances in semiconductor manufacturing. As a result, they are 
unable to exploit the enormous capacity that this technology provides. The Roadmap 
suggests that the semiconductor industry will require productivity gains greater than the 
historical 30% per-year cost reduction. This is referred to as the “design productivity 
crisis”. 

Integrated circuits are currently designed through a series of steps that refine a more 
abstract specification into a more concrete implementation. The process starts at a “be- 
havioral model”, such as a program that implements the instruction set architecture of 

Supported in part by NSF grants CCR-9700061 and CCR-9988322, and by a grant from the 
Intel Corporation. URL: http : / / www . cs . r ice . edu/"" vardi . 

^ http : //public . itrs . net/f iles/1999_SIA_Roadmap/Home . htm 

T. Marpria and W. Yi (Eds.): TACAS 2001, LNCS 2031, pp. 1-22, 2001. 

(c) springer- Verlag Berlin Heidelberg 2001 
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a processor. It ends in a description of the actual geometries of the transistors and wires 
on the chip. Each refinement step used to synthesize the processor must preserve the 
germane behavior of the abstract model. As designs grow more complex, it becomes 
easier to introduce flaws into the design during refinement. Thus, designers use vari- 
ous validation techniques to prove the correctness of the design after each refinement. 
Unfortunately, these techniques themselves grow more expensive and difficult with de- 
sign complexity. Indeed, for many designs, the size of the validation team now exceeds 
that of the design team. As the validation process has begun to exceed half the design 
project resources, the semiconductor industry has begun to refer to this problem as the 
“validation crisis”. 

Formal verification provides a new approach to validating the correct behavior of 
logic designs. In simulation, the traditional mode of design validation, “confidence” is 
the result of running a large number of test cases through the design. Formal verifica- 
tion, in contrast, uses mathematical techniques to check the entire state space of the de- 
sign for conformance to some specified behavior. Thus, while simulation is open-ended 
and fraught with uncertainty, formal verification is definitive and eliminates uncertainty 
[23]. 

One of the most significant recent developments in the area of formal design ver- 
ification is the discovery of algorithmic methods for verifying temporal-logic proper- 
ties of finite- state systems [19,68,86,103]. In temporal-logic model checking, we verify 
the correctness of a finite- state system with respect to a desired property by checking 
whether a labeled state-transition graph that models the system satisfies a temporal logic 
formula that specifies this property (see [22]). Symbolic model checking [14] has been 
used to successfully verify a large number of complex designs. This approach uses sym- 
bolic data structures, such as binary decision diagrams (BDDs), to efficiently represent 
and manipulate state spaces. Using symbolic model checking, designs containing on 
the order of 100 to 200 binary latches can be routinely verified automatically. 

Model-checking tools have enjoyed a substantial and growing use over the last few 
years, showing ability to discover subtle flaws that result from extremely improbable 
events. While until recently these tools were viewed as of academic interest only, they 
are now routinely used in industrial applications [6,42]. Companies such as AT&T, 
Cadence, Fujitsu, HP, IBM, Intel, Motorola, NEC, SGI, Siemens, and Sun are using 
model checkers increasingly on their own designs to ensure outstanding product quality. 
Three model-checking tools are widely used in the semiconductor industry: SMV, a tool 
from Carnegie Mellon University [74], with many industrial incarnations (e.g., IBM’s 
RuleBase [7]); VIS, a tool developed at the University of California, Berkeley [11]; and 
FormalCheck, a tool developed at Bell Tabs [44] and marketed by Cadence. 

A key issue in the design of a model-checking tool is the choice of the temporal 
language used to specify properties, as this language, which we refer to as the temporal 
property -specification language, is one of the primary interfaces to the tool. (The other 
primary interface is the modeling language, which is typically the hardware description 
language used by the designers). One of the major aspects of all temporal languages 
is their underlying model of time. Two possible views regarding the nature of time in- 
duce two types of temporal logics [65]. In linear temporal logics, time is treated as if 
each moment in time has a unique possible future. Thus, linear temporal logic formulas 
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are interpreted over linear sequences and we regard them as describing a behavior of a 
single computation of a program. In branching temporal logics, each moment in time 
may split into various possible futures. Accordingly, the structures over which branch- 
ing temporal logic formulas are interpreted can be viewed as infinite computation trees, 
each describing the behavior of the possible computations of a nondetermini Stic pro- 
gram. 

In the linear temporal logic LTL, formulas are composed from the set of atomic 
propositions using the usual Boolean connectives as well as the temporal connective 
G (“always”), F (“eventually”), X (“next”), and U (“until”). The branching temporal 
logic CTL"^ augments LTL by the path quantifiers E (“there exists a computation”) and 
A (“for all computations”). The branching temporal logic CTL is a fragment of CTL"^ in 
which every temporal connective is preceded by a path quantifier. Finally, the branching 
temporal logic VCTL is a fragment of CTL in which only universal path quantification is 
allowed. (Note that LTL has implicit universal path quantifiers in front of its formulas.) 

The discussion of the relative merits of linear versus branching temporal logics goes 
back to 1980 [80,65,31,8,82,35,33,18,100,101]. As analyzed in [82], linear and branch- 
ing time logics correspond to two distinct views of time. It is not surprising therefore 
that LTL and CTL are expressively incomparable [65,33,18]. The LTL formula FGp 
is not expressible in CTL, while the CTL formula AFAGp is not expressible in LTL. 
On the other hand, CTL seems to be superior to LTL when it comes to algorithmic 
verification, as we now explain. 

Given a transition system M and a linear temporal logic formula p, the model- 
checking problem for M and p is to decide whether p holds in all the computations of 
M. When p is a. branching temporal logic formula, the problem is to decide whether p 
holds in the computation tree of M. The complexity of model checking for both linear 
and branching temporal logics is well understood: suppose we are given a transition sys- 
tem of size n and a temporal logic formula of size m. For the branching temporal logic 
CTL, model-checking algorithms run in time 0{nm) [19], while, for the linear tempo- 
ral logic LTL, model-checking algorithms run in time [68]. Since LTL model 

checking is PSPACE-complete [91], the latter bound probably cannot be improved. 

The difference in the complexity of linear and branching model checking has been 
viewed as an argument in favor of the branching paradigm. In particular, the compu- 
tational advantage of CTL model checking over LTL model checking makes CTL a 
popular choice, leading to efficient model-checking tools for this logic [20]. Today, the 
dominant temporal specification language in industrial use is CTL. This dominance 
stems from the phenomenal success of SMV, the first symbolic model checker, which 
is CTL-based, and its follower VIS, also CTL-based, which serve as the basis for many 
industrial model checkers. (Verification systems that use linear- time formalisms are the 
above mentioned FormalCheck, Bell Labs’s SPIN [48], Intel’s Prover, and Cadence 
SMV.) 

In spite of the phenomenal success of CTL-based model checking, CTL suffers from 
several fundamental limitations as a temporal property- specification language, all stem- 
ming from the fact that CTL is a branching-time formalism: the language is unintuitive 
and hard to use, it does not lend itself to compositional reasoning, and it is fundamen- 
tally incompatible with semi-formal verification. In contrast, the linear-time framework 
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is expressive and intuitive, supports compositional reasoning and semi-formal verifica- 
tion, and is amenable to combining enumerative and symbolic search methods. While 
we argue in favor of the linear-time framework, we also we argue that LTL is not expres- 
sive enough, and discuss what would be the ultimate temporal specification language. 

We assume familiarity with the syntax and semantics of temporal logic [30,53,94]. 



2 CTL 

2.1 Expressiveness 

It is important to understand that expressiveness is not merely a theoretical issue; ex- 
pressiveness is also a usability issue. Verification engineers find CTL unintuitive. The 
linear framework is simply more natural for verification engineers, who tend to think 
linearly, e.g., timing diagrams [37] and message-sequence charts [64], rather than “bran- 
chingly”. IBM’s experience with the RuleBase system has been that “nontrivial CTL 
equations are hard to understand and prone to error” [90] and “CTL is difficult to use 
for most users and requires a new way of thinking about hardware” [7]. Indeed, IBM 
has been trying to “linearize” CTL in their RuleBase system [7]. It is simply much 
harder to reason about computation trees than about linear computations. 

As an example, consider the LTL formulas XFp and FXp. Both formulas say the 
same thing: “p holds sometimes in the strict future”. In contrast, consider the CTL for- 
mulas AFAXp and AXAFp. Are these formulas logically equivalent? Do they assert 
that “p holds sometimes in the strict future”? It takes a few minutes of serious pondering 
to realize that while AXAFp does assert that “p holds sometimes in the strict future”, 
this is not the case for AFAXp (we challenge the reader to figure out the meaning 
of AFAXp). The unintuitiveness of CTL significantly reduces the usability of CTL- 
based formal- verification tools. A perusal of the literature reveals that the vast majority 
of CTL formulas used in formal verification are actually equivalent to LTL formulas. 
Thus, the branching nature of CTL is very rarely used in practice. As a consequence, 
even though LTL and CTL are expressively incomparable from a theoretical point of 
view, from a practical point of view LTL is more expressive than CTL. 

One often hears the claim that expressiveness “is not an issue”, since “all users want 
to verify are simple invariance property of the form AGp'\ Of course, the reason for 
that could be the difficulty of expressing in CTL more complicated properties. Industrial 
experience with linear-time formalism shows that verification engineers often use much 
more complicated temporal properties, when provided with a language that facilitates 
the expression of such properties. Further more, even when attempting to verify an 
invariance property, users often need to express relevant properties, which can be rather 
complex, of the environment of the unit under verification. We come back to this point 
later. 

Reader who is steeped in the concurrency-theory literature may be somewhat sur- 
prised at the assertion that that CTL lacks expressive power. After all, it is known that 
CTL characterizes bisimulation, in the sense that two states in a transition system are 
bisimilar iff they satisfy exactly the same CTL formulas [12] (see also [46]), and bisimu- 
lation is considered to be the finest reasonable notion of equivalence between processes 
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[79,76]. This result, however, says little about the usefulness of CTL as a property- 
specification language. Bisimulation is a structural relation, while in the context of 
model checking what is needed is a way to specify behavioral properties rather than 
structural properties. Assertions about behavior are best stated in terms of traces rather 
than in terms of computation trees (recall, for example, the subtle distinction between 
AFAXp and AXAFp)? 

2.2 Complexity 

As we saw earlier, the complexity bounds for CTL model checking are better than those 
for LTL model checking. We first show that this superiority disappears in the context of 
open systems. 

In computer system design, we distinguish between closed and open systems. A 
closed system is a system whose behavior is completely determined by the state of the 
system. An open system is a system that interacts with its environment and whose be- 
havior depends on this interaction. Such systems are called also reactive systems [45]. 
In closed systems, nondeterminism reflect an internal choice, while in open systems 
it can also reflect an external choice [47]. Formally, in a closed system, the environ- 
ment can not modify any of the system variables. In contrast, in an open system, the 
environment can modify some of the system variables. In reality, the vast majority of 
interesting systems are open, since they have to interact with an external environment. 

We can model finite-state open systems by open modules. An open module is sim- 
ply a module with a partition of the states into two sets. One set contains system states 
and corresponds to locations where the system makes a transition. The second set con- 
tains environment states and corresponds to locations where the environment makes a 
transition. 

As discussed in [72], when the specification is given in linear temporal logic, there is 
indeed no need to worry about uncertainty with respect to the environment. Since all the 
possible interactions of the system with its environment have to satisfy a linear tempo- 
ral logic specification in order for a program to satisfy the specification, the distinction 
between internal and external nondeterminism is irrelevant. In contrast, when the spec- 
ification is given in a branching temporal logic, this distinction is relevant. There is a 
need to define a different model-checking problem for open systems, and there is a need 
to adjust current model-checking tools to handle open systems correctly. 

We now specify formally the problem of model checking of open modules {module 
checking, for short). As with usual model checking, the problem has two inputs: an open 
module M and a temporal logic formula p. For an open module M, let Vm denote 
the unwinding of M into an infinite tree. We say that M satisfies p iff p holds in 
all the trees obtained by pruning from Vm subtrees whose root is a successor of an 
environment state. The intuition is that each such tree corresponds to a different (and 
possible) environment. We want p to hold in every such tree, since, of course, we want 
the open system to satisfy its specification no matter how the environment behaves. 

^ It is also worth noting that when modeling systems in terms of transition systems, deadlocks 
have to be modeled explicitly. Once deadlocks are modeled explicitly, the two process a{h-\-c) 
and ab ac, which are typically considered to be trace equivalent but not bisimilar [76], 
become trace inequivalent. 
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A module M = {W, Wq, R, V) consists of a set W of states, a set Wo of initial 
states, a total transition relation R CW x W, and a labeling function V : W ^ 2^^^^ 
that maps each state to a set of atomic propositions that hold in this state. We model 
an open system by an open module M — {Ws ,We,Wo, R,V), where {Ws U We, 
Wo, R, V) is a module, Ws is a set of system states, and We is a set of environment 

states. We use W to denote Wg U We. For each state w ^ W, let succ{w) be the set 

of w's i^-successors; i.e., succ{w) = {w^ : R{w, Consider a system state Wg and 
an environment state We. When the current state is Wg, all the states m succ{wg) are 
possible next states. In contrast, when the current state is We, there is no certainty with 
respect to the environment transitions and not all the states in succ{we) are necessarily 
possible next states. The only thing guaranteed is that not all the environment transi- 
tions are impossible, since the environment can never be blocked. For a state w ^ W , 
let step{w) denote the set of the possible sets of u?’s next successors during an execu- 
tion. By the above, step{wg) — {si/cc(tc 5 )} and step{we) contains all the nonempty 
subsets of succ{we). 

An infinite tree is a set T C A* such that if a? • c G T where x G A* and c G A, 

then also x E T, and for all 0 < < c, we have that x • c' G T. In addition, if 

X ^ T, then x • 0 ^ T. The elements of T are called nodes, and the empty word e 
is the root of T. Given an alphabet A, a U -labeled tree is a pair (T, V) where T is 
a tree and V : T ^ E maps each node of A to a letter in A. An open module M 
can be unwound into an infinite tree (Tm , Vm) in a straightforward way. When we 
examine a specification with respect to M, it should hold not only in {Tm , Vm) (which 
corresponds to a very specific environment that does never restrict the set of its next 
states), but in all the trees obtained by pruning from {Tm, Vm) subtrees whose root is 
a successor of a node corresponding to an environment state. Let exec{M) denote the 
set of all these trees. Formally, (T,V) G exec{M) iff the following holds: 

- 6 G A and A (e) = wo . 

- For all a? G A with V[x) — w, there exists {ujq, • • • Wn} ^ step{w) such that 

A n aI^I+^ = {a? • 0, a? • 1, . . . , X • n} and for all 0 < c < n we have V{x ' c) — Wc. 

Intuitively, each tree in exec[M) corresponds to a different behavior of the environ- 
ment. Note that a single environment state with more than one successor suffices to 
make exec{M) infinite. 

Given an open module M and a CTL"^ formula (p, we say that M satisfies p, denoted 
M 1=0 if all the trees in e^rec(M) satisfy p. The problem of deciding whether M 
satisfies p is called module checking. We use M |= ^ to indicate that when we regard 
M as a closed module (thus refer to all its states as system states), then M satisfies p. 
The problem of deciding whether M \= (p is the usual model-checking problem. Note 
that while M \=o P entails M \= p, all that M \= p entails is that M ^o Indeed, 
M \=o P requires all the trees in exec{M) to satisfy p. On the other hand, M \= p 
means that the tree (Am, Vm) satisfies p. Finally, M ^o only tells us that there 
exists some tree in ex ec{M) that satisfies p. We can define module checking also with 
respect to linear- time specifications. We say that an open module M satisfies an LTL 
formula p if f M |=o Ap. 



Theorem 1. [54,61] 
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(1) The module-checking problem for LTL is PSPACE-complete. 

(2) The module-checking problem for CTL is EXPTIME-complete. 

(3) The module-checking problem for CTL^ is lEXPTIME-complete. 

Thus, module checking for LTL is easier than for CTL (assuming that EXPTIME is 
different than PSPACE), which is, in turn, easier than for CTL^. In particular, this results 
shows that branching is not “free”, as has been claimed in [35].^ See [100] for further 
discussion on the complexity-theoretic comparison between linear time and branching 
time. 

Even in the context of closed systems, the alleged superiority of CTL from the com- 
plexity perspective is questionable. The traditional comparison is in terms of worst-case 
complexity. Since, however, CTL and LTL are expressively incomparable, a comparison 
in terms of worst-case complexity is not very meaningful. A more meaningful compar- 
ison would be with respect to properties that can be expressed in both CTL and LTL. 
We claim that under such a comparison the superiority of CTL disappears. 

Eor simplicity, we consider systems M with no fairness conditions; i.e., systems in 
which all the computations are fair. As the “representative” CTL model checker we take 
the bottom-up labeling procedure of [19]. There, in order to check whether M satisfies 
(f, we label the states of M by subformulas of p, starting from the innermost formu- 
las and proceeding such that, when labeling a formula, all its subformulas are already 
labeled. Labeling subformulas that are atomic propositions. Boolean combinations of 
other subformulas, or of the form AXO or EXO is straightforward. Labeling subfor- 
mulas of the form AOi U O 2 , EOi U O 2 , AOi UO 2 , or EOi U O 2 involves a backward reach- 
ability test. As the “representative” LTL model checker, we take the automata-based 
algorithm of [103]. There, in order to check whether M satisfies p, we construct a 
Buchi word automaton for and check whether the intersection of the language 
of M with that of is nonempty. In practice, the latter check proceeds by check- 
ing whether there exists an initial state in the intersection that satisfies CTL formula 
EGtrue. Eor the construction of A^p, we follow the algorithms in [41] or [29], which 
improve [103] by being demand-driven; that is, the state space of A^p is restricted to 
states that are reachable from the initial state. 

The exponential term in the running time of LTL model checking comes from a po- 
tential exponential blow-up in the translation of p into an automaton A^p. It is shown, 
however, in [70] that for LTL formulas that can also be expressed in VCTL (the uni- 
versal fragment of CTL) there is Biichi automaton A^p whose size is linear in |^|. 
Eurthermore, this automaton has a special structure (it is “weak”), which enables the 
model checker to apply improved algorithms for checking the emptiness of the inter- 
section of M with A^p [10]. (See also [56,57] for a through analysis of the relationship 
between LTL and CTL model checkers.) 



2.3 Compositionality 

Model checking is known to suffer from the so-called state-explosion problem. In a 
concurrent setting, the system under consideration is typically the parallel composition 

^ Note also that while the satisfiability problem for LTL is PSPACE-complete [91], the problem 
is EXPTIME-complete for CTL [36,32] and 2EXPTIME-complete for CTL* [102,34]. 
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of many modules. As a result, the size of the state space of the system is the product of 
the sizes of the state spaces of the participating modules. This gives rise to state spaces 
of exceedingly large sizes, which makes model-checking algorithms impractical. This 
issue is one of the most important ones in the area of computer-aided verification and is 
the subject of active research (cf. [15]). 

Compositional, or modular, verification is one possible way to address the state- 
explosion problem, cf . [24] . In modular verification, one uses proof rules of the follow- 
ing form: 

Ml 1= \ 

Ms >Mi||M2 

C(l’l,V’2,'0) J 

Here M 0 means that the module M satisfies the formula 0, the symbol “||” denotes 
parallel composition, and is some logical condition relating ^ 1 , ^ 2 , and 

The advantage of using modular proof rules is that it enables one to apply model 
checking only to the underlying modules, which have much smaller state spaces. 

A key observation, see [77,66,50,93,81], is that in modular verification the specifi- 
cation should include two parts. One part describes the desired behavior of the module. 
The other part describes the assumed behavior of the system within which the module 
is interacting. This is called the assume- guarantee paradigm, as the specification de- 
scribes what behavior the module is guaranteed to exhibit, assuming that the system 
behaves in the promised way. 

For the linear temporal paradigm, an assume-guarantee specification is a pair {p, 'ip), 
where both p and 'ip are linear temporal logic formulas. The meaning of such a pair is 
that all the computations of the module are guaranteed to satisfy 'ip, assuming that all the 
computations of the environment satisfy p. As observed in [81], in this case the assume- 
guarantee pair {(p, 'ip) can be combined to a single linear temporal logic formula p 
'ip. Thus, model checking a module with respect to assume-guarantee specifications in 
which both the assumed and the guaranteed behaviors are linear temporal logic formulas 
is essentially the same as model checking the module with respect to linear temporal 
logic formulas. 

The situation is different for the branching temporal paradigm, where assumptions 
are taken to apply to the computation tree of the system within which the module is in- 
teracting [43]. In this framework, a module M satisfies an assume-guarantee pair {p, ip) 
iff whenever M is part of a system satisfying p, the system also satisfies 'ip. (As is shown 
in [43], this is not equivalent to M satisfying p ^ ip.) We call this branching modular 
model checking. Furthermore, it is argued in [43], as well as in [26,51,43,27], that in 
the context of modular verification it is advantageous to use only universal branching 
temporal logic, i.e., branching temporal logic without existential path quantifiers. In a 
universal branching temporal logic one can state properties of all computations of a 
program, but one cannot state that certain computations exist. Consequently, universal 
branching temporal logic formulas have the helpful property that once they are satisfied 
in a module, they are satisfied also in every system that contains this module. The focus 
in [43] is on using VCTL, the universal fragment of CTL, for both the assumption and 
the guarantee. We now focus on the branching modular model-checking problem, where 
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assumptions and guarantees are in both VCTL and in the more expressive VCTL"^, the 
universal fragment of CTL^. 

Let M = {W, Wo,R, V) and = (1L"^ V') be two modules with sets 

AP and AP^ of atomic propositions. The composition of M and M^ denoted M\\M\ 
is a module that has exactly these behaviors which are joint to M and M^ We define 
M\\M' to be the module R, V'') over the set AP^^ — AP U AP^ of atomic 

propositions, where W'' = {W x W') H {(u?, w') : V{w) H AP' = V'(w') H AP}, 
PLo- = {Wo X Wo>) n W", R" = {{{w, w'), (s, s^)) : {w, s) e R and (u;^ G R'}, 
and V"{{w, w')) = V{w) U V'{w') for {w, w') G W". 

In modular verification, one uses assertions of the form {(f)M{ip) to specify that 
whenever M is part of a system satisfying the universal branching temporal logic for- 
mula (p, the system satisfies the universal branching temporal logic formula x/j too. For- 
mally, {p)M{i/j) holds if M\\M' |= ^ for all M' such that M\\M' |= p. Here p is an 
assumption on the behavior of the system and xp is the guarantee on the behavior of the 
module. Assume-guarantee assertions are used in modular proof rules of the following 
form: 



{pi)Mi{pJi) 

(true) Ml ((^i) 
{p2)M2{iJ2) 
(true) M2 ((^2) 



> (true)Mi||M2(V’i A ^^ 2 ) 



Thus, a key step in modular verification is checking that assume-guarantee as- 
sertions of the form (p)M{'ip) hold, which we called the branching modular model- 
checking problem. 

Theorem 2. [60] 

(1) The branching modular model- checking problem for MCTL is PSPACE-complete. 

(2) The branching modular model- checking problem for MCTV" is EXPSPACE-com- 
plete. 

Thus, in the context of modular model checking, VCTL has the same computational 
complexity as LTL, while VCTL"^ is exponentially harder. The fact that the complexity 
for VCTL is the same as the complexity for LTL is, however, somewhat misleading. 
VCTL is simply not expressive enough to express assumptions that are strong enough 
to prove the desired guarantee. This motivated Josko to consider modular verification 
with guarantees in CTL and assumptions in LTL. Unfortunately, it is shown in [60] that 
the EXPSPACE lower bound above applies even for that setting. 

Another approach to modular verification for VCTL is proposed in [43], where the 
following inference rule is proposed: 



Ml ^ Ai ] 
Ai||M2^ A 2 [Mi||M2 
Mi\\A 2^ p ) 



Here Ai and A 2 are modules that serve as assumptions, and ^ is the simulation re- 
finement relation [75]. In other words, if Mi guarantees the assumption Ai, M 2 under 
the assumption Ai guarantees the assumption A 2 , and Mi under the assumption A 2 
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guarantees then we know that Mi \ \M 2 , under no assumption, guarantees (p. The ad- 
vantage of this rule is that both the ^ and |= relation can be evaluated in polynomial 
time. Unfortunately, the simulation relation is much finer than the trace-containment 
relation (which is the refinement relation in the linear- time framework). This makes it 
exceedingly difficult to come up with the assumptions and A 2 above. 

What do CTL users do in practice? In practice, they use the following rule: 



M2 A A2 
M1WA2 1 = p 







That is, instead of checking that Mi \ \M2 |= p, one checks that Mi | IA2 1 = p, where 
A 2 is an abstraction of M 2 . As CTL model checkers usually do not support the test 
M 2 A A 2 , users often rely on their “intuition”, which is typically a “linear intuition” 
rather than “branching intuition”.^ In other words, a typical way users overcome the 
limited expressive power of CTL is by “escaping” outside the tool; they build the “stub” 
A 2 in a hardware description language. Unfortunately, since stubs themselves could be 
incorrect, this practice is unsafe. (Users often check that the abstraction A 2 satisfies 
some CTL properties, such as AGEFp, but this is not sufficient to establish that M 2 < 
A2.) 

In summary, CTL is not adequate for modular verification, which explains why 
recent attempts to augment SMV with assume-guarantee reasoning are based on linear 
time reasoning [73]. 



2.4 Semi-formal Verification 

Because of the state-explosion problem, it is unrealistic to expect formal-verification 
tools to handle full systems or even large components. At the same time, simulation- 
based dynamic validation, while being able to handle large designs, covers only a small 
fraction of the design space, due to resource constraints. Thus, it has become clear that 
future verification tools need to combine formal and informal verification [106]. The 
combined approach is called semi-formal verification (cf. [39]). Such a combination, 
however, is rather problematic for CTL-based tools. CTL specifications and model- 
checking algorithms are in terms of computation trees; in fact, it is known that there 
are CTL formulas, e.g., AF AX p, whose failure cannot be witnessed by a linear coun- 
terexample [18].^ In contrast, dynamic validation is fundamentally linear, as simulation 
generates individual computations. Thus, there is an inherent “impedance mismatch” 

^ Note that linear- time refinement is defined in terms of trace containment, which is a behavioral 
relation, while branching-time refinement is defined in terms of simulation, which is a state- 
based relation. Thus, constructing an abstraction A 2 such that M 2 < A 2 requires a very deep 
understanding of the environment M 2 . 

^ One of the advertised advantages of model checking is that when the model checker returns 
a negative answer, that answer is accompanied by a counterexample [21]. Note, however, that 
validation engineers are usually interested in linear counterexamples, but there are CTL for- 
mulas whose failure cannot be witnessed by a linear counterexample. In general, CTL-based 
model checkers do always accompany a negative answer by a counterexample. A similar com- 
ment applies to positive witnesses [59]. 
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between the two approaches. This explains why current approaches to semi-formal ver- 
ification are limited to invariances, i.e., properties of the form AGp. While many design 
errors can be discovered by model checking invariances, modular verification of even 
simple invariances often requires rather complicated assumptions on the environment 
in which the component under verification operates. Current semi-formal approaches, 
however, cannot handle general assumptions. Thus, the restriction of semi-formal veri- 
fication to invariances is quite severe, limiting the possibility of integrating CTL-based 
model checking in traditional validation environments. 

3 Linear Time 

Our conclusion from the previous section is that CTL-based model checking, while 
phenomenally successful over the last 20 years, suffers from some inherent limitations 
that severely impede its functionality. As we show now, the linear-time approach does 
not suffer from these limitations. 

3.1 The Linear-Time Framework 

LTL is interpreted over computations, which can be viewed as infinite sequences of truth 
assignments to the atomic propositions: i.e., a computation is a function tt : N ^ 2^^^^ 
that assigns truth values to the elements of a set Prop of atomic propositions at each 
time instant (natural number). For a computation tt and a point i G N, the notation 
7T, i 1= (p indicates that a formula p holds at the point i of the computation tt. For 
example, tt, i |= iff tt, i + 1 |= p. We say that tt satisfies a formula p, denoted 
TT 1= iff TT, 0 1= 

Designs can be described in a variety of formal description formalisms. Regardless 
of the formalism used, di finite- state design can be abstractly viewed as a labeled transi- 
tion system, i.e., as a module M = (fF, Wo, R, V), where W is the finite sets of states 
that the system can be in. Wo C W is the set of initial states of the system, R C W‘^ 
is a total transition relation that indicates the allowable state transitions of the system, 
and F : PF ^ assigns truth values to the atomic propositions in each state of 

the system. A path in M that starts at u is sl possible infinite behavior of the system 
starting at u, i.e., it is an infinite sequence i/q, • • • of states in W such that uo = u, 
and Ui R for all i > 0.^ The sequence F(uq), F(ui) . . . is a computation of M 
that starts at u. The language of M, denoted L{M), consists of all computations of M 
that start at a state in PFo • We say that M satisfies an LTL formula p if all computations 
of L{M) satisfy p. 

The verification problem for LTL is to check whether a transition system P satisfies 
an LTL formula p. The verification problem for LTL can be solved in time O ( | F* | • 2^ ) 
[68]. In other words, there is a model-checking algorithm for LTL whose running time 
is linear in the size of the program and exponential in the size of the specification. This 
is acceptable since the size of the specification is typically significantly smaller than the 
size of the program. 

^ It is important to consider infinite paths, since we are interested in ongoing computations. 

Deadlock and termination can be modeled explicitly via sink state. 
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The dominant approach today to LTL model checking is the automata-theoretic ap- 
proach [103] (see also [99]). The key idea underlying the automata-theoretic approach 
is that, given an LTL formula p, it is possible to construct a finite-state automaton 
that accepts all computations that satisfy p. The type of finite automata on infinite 
words we consider is the one defined by Buchi [13] (c.f. [96]). A Bilchi automaton is 
a tuple A = (U, S, S'o, p, F), where 27 is a finite alphabet, 5 is a finite set of states. 
So C 5 is a set of initial states, />:5x27^2'^isa nondetermini Stic transition 
function, and T" C 5 is a set of accepting states. A run of A over an infinite word 
lu = aiU 2 • • *, is a sequence sqSi • • *, where sq G So and Si G p{si-i, a*) for all i > 1. 
A run So, si, . . . is accepting if there is some designated state that repeats infinitely 
often, i.e., for some s E F there are infinitely many i’s such that Si = s. The infinite 
word w is accepted by A if there is an accepting run of A over w. The language of 
infinite words accepted by A is denoted F(A). The following fact establishes the corre- 
spondence between LTL and Buchi automata: Given an LTL formula p, one can build 
a Buchi automaton A^ = (27, S, S'o, p, , F), where 27 = 2^^^^ and |5| < such 

that L(A^) is exactly the set of computations satisfying the formula <p [104], 

This correspondence enables the reduction of the verification problem to an auto- 
mata-theoretic problem as follows [103]. Suppose that we are given a system M and an 
LTL formula p'. {\) construct the automaton A^^ that corresponds to the negation of 
the formula p, (2) take the product of the system M and the automaton A^^ to obtain 
an automaton and (3) check that the automaton Am, ip is nonempty, i.e., that it 

accepts some input. If it does not, then the design is correct. If it does, then the design is 
incorrect and the accepted input is an incorrect computation. The incorrect computation 
is presented to the user as a finite trace, possibly followed by a cycle. 

The linear-time framework is not limited to using LTL as a specification language. 
There are those who prefer to use automata on infinite words as a specification formal- 
ism [104]; in fact, this is the approach of FormalCheck [62]. In this approach, we are 
given a design represented as a finite transition system M and a property represented 
by a Buchi (or a related variant) automaton P. The design is correct if all computations 
in L(M) are accepted by T*, i.e., L(M) C L{P). This approach is called the language- 
containment approach. To verify M with respect to P we: (1) construct the automaton 
P^ that complements P, (2) take the product of the system M and the automaton P^ to 
obtain an automaton Am,p, and (3) check that the automaton Am,p is nonempty. As 
before, the design is correct iff Am,p is empty. 

3.2 Advantages 

The advantages of the linear- time framework are: 

“ Expressiveness: The linear framework is more natural for verification engineers. 
In the linear framework both designs and properties are represented as finite- state 
machines (we saw that even LTL formulas can be viewed as finite- state machines); 
thus verification engineers employ the same conceptual model when thinking about 
the implementation and the specification [67] . 

- Com positionality: The linear framework supports the assume-guarantee method- 
ology. An assumption on the environment is simply expressed as a property E. 
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Thus, instead of checking that L{M) C L{P), we check that L{M) H L{E) C 
L{P) [81]. Furthermore, we can add assumptions incrementally. Given assump- 
tions El, , Ek, one needs to check that L{M) H L{Ei) H . . . H L{Ek) C L{P). 
The linear formalism is strong enough to express very general assumptions, as 
it can describe arbitrary finite-state machines, nondeterminism, and fairness. In 
fact, it is known that to prove linear-time properties of the parallel composition 
Mll^ill . . . it suffices to consider the linear-time properties of the compo- 
nents M,Ei, . . . ,Ek [71]. 

Semi-formal verification: As we saw, in the linear framework language contain- 
ment is reduced to language emptiness, i.e., a search for a single computation satis- 
fying some conditions. But this is precisely the same principle underlying dynamic 
validation. Thus, the linear framework offers support for search procedures that 
can be varied continuously from dynamic validation to full formal verification. This 
means that techniques for semi-formal verification can be applied not only to invari- 
ances but to much more general properties and can also accommodate assumptions 
on the environment [58]. In particular, linear- time properties can be compiled into 
“checkers” of simulation traces, facilitating the integration of formal verification 
with a traditional validation environment [58]. Such checkers can also be run as 
run-time monitors, which can issue an error message during a run in which a safety 
property is violated [16]. 

Property-specific abstraction: Abstraction is a powerful technique for combating 
state explosion. An abstraction suppresses information from a concrete state-space 
by mapping it into a smaller, abstract state-space. As we saw, language containment 
is reduced to checking emptiness of a system Am^lp (or Am^p) that combines the 
design with the complement of the property. Thus, one can search for abstractions 
that are tailored to the specific property being checked, resulting in more dramatic 
state- space reductions [38]. 

Combined methods: Nonemptiness of automata can be tested enumeratively [25] 
or symbolically [97] . Recent work has shown that for invariances enumerative and 
symbolic methods can be combined [89]. Since in the linear framework model 
checking of general safety properties can be reduced to invariance checking of the 
composite system Am^lp (or Am^p) [58], the enumerative- symbolic approach can 
be applied to a large class of properties and can also handle assumptions. 
Uniformity: The linear framework offers a uniform treatment of model checking, 
abstraction, and refinement [1], as all are expressed as language containment. For 
example, to show that a design Pi is a refinement a design P 2 , we have to check that 
L{Pi) C L{P 2 ). Similarly, one abstracts a design M by generating a design 
that has more behaviors than M, i.e., L{M) C L(M^). Thus, an implementor can 
focus on an efficient implementation of the language-containment test. This means 
that a linear-time model checker can also be used to check for sequential equiva- 
lence of finite-state machines [49]. Furthermore, the automata-theoretic approach 
can be easily adapted to perform quantitative timing analysis, which computes min- 
imum and maximum delays over a selected subset of system executions [17]. 
Bounded Model Checking: In linear-time model checking one searches for a 
counterexample trace, finite or infinite, which falsifies the desired temporal prop- 
erty. In bounded model checking, the search is restricted to a trace of a bounded 
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length, in which the bound is selected before the search. The motivating idea is that 
many errors can be found in traces of relatively small length (say, less than 40 cy- 
cles). The restriction to bounded-length traces enables a reduction to propositional 
satisfiability (SAT). It was recently shown that SAT-based model checking can of- 
ten significantly outperform BDD-based model checkers [9]. As bounded model 
checking is essentially a search for counterexample traces of bounded length, its 
fits naturally within the linear-time framework, but does not fit the branching rime 
framework. 

3.3 Beyond LTL 

Since the proposal by Pnueli [80] to apply LTL to the specification and verification of 
concurrent programs, the adequacy of LTL has been widely studied. One of the con- 
clusions of this research is that LTL is not expressive enough for the task. The first to 
complain about the expressive power of LTL was Wolper [105] who observed that LTL 
cannot express certain cj -regular events (in fact, LTL expresses precisely the star- free 
cj -regular events [95]). As was shown later [69], this makes LTL inadequate for compo- 
sitional verification, since LTL is not expressive enough to express assumptions about 
the environment in modular verification. It is now recognized that a linear temporal 
property logic has to be expressive enough to specify all c<; -regular properties [104]. 
What then should be the “ultimate” temporal property- specification language? 

Several extensions to LTL have been proposed with the goal of obtaining full lo- 
regularity: 

- Vardi and Wolper proposed ETL, the extension of LTL with temporal connectives 
that correspond to c<; -automata [105,104]), ETL essentially combines two perspec- 
tive on hardware specification, the operational perspective (finite- state machines) 
with the behavioral perspective (temporal connectives). Experience has shown that 
both perspectives are useful in hardware specification. 

- Banieqbal and Barringer proposed extending LTL with fixpoint operators [4] (see 
also [98]), yielding a linear //-calculus (cf. [52]), and 

- Sistla, Vardi, and Wolper proposed QPTL, the extension of LTL with quantification 
over propositional variables [92] . 

It is not clear, however, that any of these approaches provides an adequate solution from 
a pragmatic perspective: implementing full ETL requires a complementation construc- 
tion for Biichi automata, which is still a topic under research [55]; fixpoint calculi are 
notoriously difficult for users, and are best thought as an intermediate language; and 
full QPTL has a nonelementary time complexity [92] . 

Another problem with these solutions is the lack of temporal connectives to describe 
past events. While such connectives are present in works on temporal logic by philoso- 
phers (e.g., [85,78]), they have been purged by many computer scientists, who were 
motivated by a strive for minimality, following the observation in [40] that in applica- 
tions with infinite future but finite past, past connectives do not add expressive power. 
Somewhat later, however, arguments were made for the restoration of the past in tem- 
poral logic. The first argument is that while past temporal connectives do not add any 
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expressive power the price for eliminating them can be high. Many natural statements 
in program specification are much easier to express using past connectives [87]. In fact, 
the best known procedure to eliminate past connectives in LTL may cause a significant 
blow-up of the considered formulas [69] . 

A more important motivation for the restoration of the past is again the use of tem- 
poral logic in modular verification. In global verification one uses temporal formulas 
that refer to locations in the program text [81]. This is absolutely verboten in modular 
verification, since in specifying a module one can refer only to its external behavior. 
Since we cannot refer to program location we have instead to refer to the history of the 
computation, and we can do that very easily with past connectives [5]. 

We can summarize the above arguments for the extension of LTL with a quote 
by Pnueli [81]: “In order to perform compositional specification and verification, it is 
convenient to use the past operators but necessary to have the full power of ETL.” 

3.4 A Pragmatic Proposal 

The design of a temporal property- specification language in an industrial setting is 
not a mere theoretical exercise. Such an effort was recently undertaken by a formal- 
verification group at Intel. In designing such a language one has to balance competing 
needs: 

- Expressiveness: The logic has to be expressive enough to cover most properties 
likely to be used by verification engineers. This should include not only properties 
of the unit under verification but also relevant properties of the unit’s environment. 

- Usability: The logic should be easy to understand and to use for verification engi- 
neers. At the same time, it is important that the logic has rigorous formal semantics 
to ensure correct compilation and optimization and enable formal reasoning. 

- Closure: The logic should enable the expression of complex properties from sim- 
pler one. This enables maintaining libraries of properties and property templates. 
Thus, the logic should be closed under all of its logical connectives, both Boolean 
and temporal. 

- History: An industrial tool is not developed in a vacuum. At Intel, there was al- 
ready a community of model-checking users, who were used to a certain temporal 
property- specification language. While the new language was not expected to be 
fully backward compatible, the users demanded an easy migration path. 

- Implementability: The design of the language went hand-in-hand with the design 
of the model-checking tool [2]. In considering various language features, their im- 
portance had to be balanced against the difficulty of ensuring that the implementa- 
tion can handle these features. 

The effort at Intel culminated with the design of FTL, a new temporal property spec- 
ification language [3]. FTL is the temporal logic underlying ForSpec, which is Intel’s 
new formal specification language. A model checker with FTL as its temporal logic is 
deployed at Intel [2]. The key features of FTL are as follows: 

- FTL is a linear temporal logic, with a limited form of past connectives, and with 
the full expressive power of cj -regular languages. 
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- it is based on a rich set of logical and arithmetical operations on bit vectors to 
describe state properties, 

- it enables the user to define temporal connectives over time windows, 

- it enables the user to define regular events, which are regular sequences of Boolean 
events, and then relate such events via special connectives, 

- it enables the user to quantify universally over propositional variables, and 

- it contains constructs that enable the users to model multiple clock and reset signals, 
which is useful in the verification of hardware design. 

Of particular interest is the way FTL achieves full (^-regularity. FTL borrows from 
both FTL (as well as PDL [36]), by extending LTL with regular events, and from QPTL, 
by extending LTL with universal quantification over propositional variables. Each of 
these extensions provides us with full (^-regularity. Why the redundancy? The rationale 
is that expressiveness is not just a theoretical issue, it is also a usability issue. It is not 
enough that the user is able to express certain properties; it is important that the user 
can express these properties without unnecessary contortions. Thus, one need not shy 
away from introducing redundant features, while at the same time attempting to keep 
the logic relatively simple. 

There is no reason, however, to think that FTL is the final word on temporal property- 
specification languages. First, one would not expect to have an “ultimate” temporal 
property- specification language any more than one would expect to have an “ultimate” 
programming language. There are also, in effect, two competing languages. Formal- 
Check uses a built-in library of automata on infinite words as its property- specification 
language [63], while Cadence SMV^ uses LTL with universal quantification over propo- 
sitional variables. Our hope is that the publication of this paper and of [3], concomitant 
with the release of an FTL-based tool to Intel users, would result in a dialog on the sub- 
ject of property-specification logic between the research community, tool developers, 
and tools users. It is time, we believe to close the debate on the linear- time vs. branch- 
ing time issue, and open a debate on linear- time languages. 

4 Discussion 

Does the discussion above imply that 20 years of research into CTL-based model check- 
ing have led to a dead end? To the contrary ! The key algorithms underlying symbolic 
model checking for CTL are efficient graph-theoretic reachability and fair-reachability 
procedures (cf. [88]). The essence of the language-containment approach is that ver- 
ification of very general linear-time properties can be reduced to reachability or fair- 
reachability analysis, an analysis that is at the heart of CTL-based model-checking en- 
gines. Thus, a linear-time model checker can be built on top of a CTL model checker, 
as in Cadence SMV, leveraging two decades of science and technology in CTL-based 
model checking. 

It should also be stated clearly that our criticism of CTL is in the specific context 
of property- specification languages for model checking. There are contexts in which 
the branching-time framework is the natural one. For example, when it comes to the 

^ http : / /www-cad . eecs . berkeley . e(du/"" kenmcmil/ smv/ 




Branching vs. Linear Time: Final Showdown 



17 



synthesis of reactive systems, one has to consider a branching- time framework, since 
all possible strategies by the environment need to be considered [83,84]. Even when 
the goal is a simple reachability goal, one is quickly driven towards using CTL as a 
specification language [28]. 

Even in the context of model checking, CTL has its place. In model checking one 
checks whether a transition system M satisfies a temporal formula (p. The transition 
system M is obtained either by compilation from an actual design, typically expressed 
using a hardware description language such as VHDL or Verilog, or is constructed 
manually by the user using a modeling language, such as SMV’s SML [74]. In the lat- 
ter case, the user often wishes to “play” with M, in order to ensure that M is a good 
model of the system under consideration. Using CTL, one can express properties such 
as AGAFp, which are structural rather than behavioral. A CTL-based model checker 
enables the user to “play” with M by checking its structural properties. Since the reach- 
ability and fair-reachability engine is at the heart of both CTL-based and linear-time- 
based model checkers, we believe that the “ultimate” model checker should have both 
a CTL front end and a linear-time front end, with a common reachability and fair- 
reachability engine. 
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Abstract. Propositional (Boolean) logic is conceptually simple. It pro- 
vides a rich basis for the representation of hnite structures, but is com- 
putationally complex. Many current verihcation techniques are based on 
propositional encodings. 

Propositional representations lead to problems that are, in general, com- 
putationally intractable. Nevertheless, datastructures for representing 
propositional formulae, and algorithms for reasoning about them, pro- 
vide generic tools that can be applied to a wide variety of computational 
problems. Natural problem instances are often effectively solved by these 
generic approaches. 

There is a growing literature of algorithms for propositional reasoning, 
and of techniques for propositional representation of tasks in areas rang- 
ing from cryptography, constraint satisfaction and planning, to system 
design, validation and verihcation. 

We present a model-theoretic account of propositional encodings for 
questions of logical validity. Validity is characterised model-thoretically. 
For restricted logics, checking validity in a restricted class of models may 
suffice. Classes of structures on a hnite domain can be encoded as propo- 
sitional theories, and validity in such a class is encoded propositionally, 
by means of a syntactic translation to a propositional formula. 

This provides a unihed setting for generating efficient propositional en- 
codings suitable for analysis using BDD or SAT packages. 
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Abstract. Checking for language containment between nondeterminis- 
tic cc-automata is a central task in automata-based hierarchical verihca- 
tion. We present a symbolic procedure for language containment checking 
between two Buchi automata. Our algorithm avoids determinization by 
intersecting the implementation automaton with the complement of the 
specihcation automaton as an alternating automaton. We present a fix- 
point algorithm for the emptiness check of alternating automata. The 
main data structure is a nondeterministic extension of binary decision 
diagrams that canonically represents sets of Boolean functions. 



1 Introduction 

Binary decision diagrams (BDDs) have greatly extended the scope of systems 
that can be verified antomatically: instead of searching the entire state space 
of a model, the verification algorithm works with a symbolic representation of 
relevant state sets. Symbolic methods have been developed for many verification 
problems, in particnlar for temporal logic model checking [CGP99]. 

For the langnage containment problem C{A) C C{B) between two lo- 
automata A and symbolic algorithms have so far only been proposed in the 
case where B is deterministic [TBK95]. This is a serions restriction: in property- 
oriented verification it is advantageons to allow for nondeterminism, since it 
nsnally leads to simpler specifications (see [THB95] for examples). Having the 
same type of antomaton for A and B also makes hierarchical verification pos- 
sible, where an intermediate antomaton appears as an implementation in one 
verification problem and as a specification in the next; the verification can fol- 
low a chain of increasingly more complex models and ensnre that observable 
properties are preserved. 

The standard approach to the langnage containment check C{A) C C{B) is 
to first complement B^ and then check the intersection with A for emptiness. 

* This research was supported in part by the National Science Foundation grant CCR- 
99-00984-001, by ARO grant DAAG55-98- 1-0471, by ARO/MURl grant DAAH04- 
96-1-0341, by ARPA/Army contract DABT63-96-C-0096, and by ARPA/AirForce 
contracts F33615-00-C-1693 and F33615-99-C-3014. 
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The difficulty with this approach is that the classic constructions for the com- 
plementation of (^-automata are all based on determinization. Determinization 
algorithms for cj-automata, like Safra’s construction [Saf88], use an intricate 
structure to describe deterministic states. Such states not only encode sets of 
nondeterministic states reachable by the same input prefix, but also keep track 
of the acceptance status of the nondeterministic computations. Safra-trees have 
been found to be too complex to be directly encoded in a BDD [THB95]. 

In our solution we sidestep the determinization construction by intersecting 
C{A) and C{B) not in their representation as nondeterministic automata, but 
in the more general framework of alternating automata, where complementation 
can be achieved by dualizing the transition function and acceptance condition. 
This approach makes use of concepts from a new complementation construction 
by Kupferman and Vardi [KV97]. The use of alternation not only simplifies the 
algorithm, it also allows us to combine the two automata before any analysis 
takes place. Thus, no effort is wasted on parts of B that are not reachable in the 
combined automaton. 

We describe a fixpoint algorithm that checks the resulting alternating au- 
tomaton for emptiness. This construction involves reasoning about sets of sets 
of states, one level of aggregation above the sets of states that can be repre- 
sented by a BDD. We therefore propose an extension to BDDs: by allowing the 
underlying automaton to be nondeterministic, sets of (deterministic) BDDs can 
be embedded in a single (nondeterministic) structure. 

Overview. In the following Section 2 we briefly survey related work. Sec- 
tion 3 provides background on automata over infinite words. We review de- 
terministic BDDs in Section 4 and present our nondeterministic extension in 
Section 5. In Section 6 we develop the fixpoint construction for the emptiness 
check on alternating automata. 

2 Related Work 

Language containment checking. There are two systems that provide com- 
pletely automatic language containment checking. Omega [BMUV97] is a pack- 
age of procedures related to cj-automata and infinite games over finite graphs. 
Omega implements Safra’s construction and uses a completely explicit represen- 
tation of the state space. HSIS [THB95] is a partially symbolic implementation, 
again based on Safra’s construction. While the state space is still represented 
explicitly, HSIS makes auxiliary use of BDDs to represent relations on states. 

Simulation checking. Simulation is a strictly stronger property than 
language containment. Tools capable of simulation checking, such as Mocha 
[AHM"^98], can therefore be used to prove language containment (usually with 
some user interaction), but a failed simulation check does not contradict lan- 
guage containment. 

Nondeterministic BDDs. There is a rich literature on extensions to 
BDDs. In particular the idea to add nondeterminism has been exploited be- 
fore, but with a different objective: parallel- access diagrams [BD96] interpret 
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Fig. 1. Biichi automata A and B. Accepting states are shown in gray. 



nondeterminism as disjunction to achieve a more compact representation of cer- 
tain Boolean functions. Takagi et al. [TNB"^97] show that certain methods for 
the satisfiability checking of combinatorial circuits and techniques that represent 
Boolean functions as sets of product terms can be regarded as nondeterministic 

BDDs. 

Alternation. Muller and Schupp [MS87] observed that complementing an 
alternating automaton corresponds to dualizing the transition function and ac- 
ceptance condition. The application of alternation in verification methods has 
been studied both for automata-based algorithms [Var95] and in deductive veri- 
fication [MSOO]. Alternating automata have been used in a new complementation 
constructions for Biichi automata [KV97]. 

3 Automata on Infinite Words 

Automata on infinite words differ from automata on finite words in their accep- 
tance mechanism: there are no final states; instead, acceptance is determined 
w.r.t. the set of states that are visited infinitely often. Different types of accep- 
tance conditions are studied (see [Tho94] for an overview). In the following we 
will work with Biichi conditions. 

Definition 1. A (nondeterministic) Biichi-automaton A — {U^Q^O^p^a) con- 
sists of a finite input alphabet A, a finite set of states a set of initial states 
0^ a transition function p : Q x U ^ 2^ and a set of accepting states a <ZQ, 

A run of A on an input string /o, ^ is an infinite sequence of states 

(T — tJQ, '^ 1 , • • • s.t. 1^0 ^ ^ and for every i > 0, Vi+i e p{vifii)^ i.e., the first state 
is an initial state and each successor state is included in the successor set given 
by the transition function. 

A run is accepting if some accepting state is visited infinitely often. The 
language C{A) of a Biichi automaton consists of those input strings that have 
accepting runs. 

Example 1. The automaton A in Figure 1 accepts all infinite words over the 
alphabet {0, 1} that begin with 0 and contain infinitely many Os. Since B does 
not accept the word 0^, C[A) ^ C{B). 
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Fig. 2. Alternating antomaton C and a compntation for inpnt word 0^. Ac- 
cepting states a = {a,A,B,C,D} are shown in gray, stable states /? = 
{a, 6, c, A, C, D} as boxes. 



The branching mode in a nondeterministic antomaton is existential; a word 
is accepted if its snffix is accepted in one of the snccessor states. Alternating 
automata combine existential branching with universal branching. Again, many 
different acceptance conditions are studied. We will work with a combined Biichi 
and co-Biichi condition. 

Definition 2. An alternating automaton is a tuple A = {U , Q , 9 , p, a, [3) with 
A, Q, a as before; a set of stable states a set of initial state sets 9 C 2^^; and 
the transition function p : Q x U ^ 2‘^ , a function from states and alphabet 
letters to sets of successor state sets, 

A run of an alternating automaton is a directed acyclic graph (dag) (TV, E), 
where the nodes are labeled with states state : N ^ Q. It is often useful to view 
the dag as a sequence of sets of nodes which we call slices: the i-th slice is the 
set of nodes that are reached after traversing i edges from root nodes. We call 
the set of states that occur on the nodes of the i-th slice the i-th configuration. 
Let configuration 0 be the root configuration, and, for finite segments of a run, 
call the first configuration source and the last configuration target. 

In a run for the input string lofii, . . . G the root configuration is one of 
the sets in 9, and, for each state v in the i-th configuration, the set of states on 
successor nodes is one of the successor sets in p{v, li). A run is accepting if every 
path visits some a-state infinitely often, and eventually only visits states in /?. 

Finding a Biichi automaton that accepts the complement of a nondeterminis- 
tic Biichi automaton is complicated and leads to an exponential blow-up [Saf88]. 
Alternating automata can be complemented without blow-up by dualizing the 
transition function and acceptance condition [MS87]. Thus, it is also very sim- 
ple to construct an alternating automaton that accepts those words that are 
accepted by the first but not by the second automaton: 

Theorem 1. For two Biichi automata Ai = {E,Qi,9i, pi, ai), A 2 = 

{E , Q 2 , 92, p 2 , C 12 ) (where p 2 {pfi) 7^ 0 for all p G Q 2 fi ^ E), the alternating 
automaton A = {E, Q, 9, p, a, (3) with 

- Q = QiU Q 2 , 
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— ^ = {^2 u {p} I p e 9i}, 

— p{s,a) = if{se Qi) then {{p} | p £ pi(s,a)} else {p 2 {s,a)}, 

— a = ai U Q 2 , 

— /? = (Q2\q^2) U Qi 

accepts the language C{A) — C{Ai) H C{A 2 )^ 

Example 2. An alternating antomaton for the langnage C{C) — C{A) H C{B) is 
shown in Fignre 2. 

4 Binary Decision Diagrams 

A binary decision diagram (BDD) [Bry86] is a data strnctnre for the representa- 
tion of Boolean fnnctions / : ^ In their rednced and ordered form, BDDs 

represent Boolean fnnctions canonically for fixed variable orderings. For many 
examples BDDs significantly ontperform other representations. BDDs can be 
nsed to store sets of states, represented by their characteristic function: Boolean 
“or” corresponds to set union, “and” to intersection. BDDs are also used to rep- 
resent relations on states, such as the transition function of an automaton. This 
is done by adding a second “primed” copy for each variable. 

Definition 3 (BDD). A ( deterministic) binary decision diagram (BDD) 
(V, Q, Eq, El, <)) IS a directed acyclic graph with internal nodes Q, edges EqUEi, 
a single root <) and two terminal nodes 0, 1. Each internal node n G Q has ex- 
actly two departing edges low{n) G Eq, high{n) G Ei, Every internal node n G Q 
IS labeled with a variable var{n) G V* 

The successor nodes along the low{n) and high{n) edges are referred to as 
the low and high successors of n. A BDD d with root node <) defines a Boolean 
function /^ = /^ : ^ IB as follows: 

— the terminal node 1 defines the constant function true. 

— the terminal node 0 defines the constant function false. 

— an internal node n G Q represents the function 

/ : (if var{n) then /i else /o) 

where /o,/i are the functions represented by the low and high successors, 
respectively. 

Of special interest are BDDs in a canonical form called reduced and ordered. 

Definition 4. A BDD /s ordered (OBDD), if on all paths through the graph the 
labeling respects a given linear order on the variables vi > V 2 > • • • > Vn; i^e., 
on all paths through the graph, smaller variables are traversed first. An OBDD 
IS reduced (ROBDD) if 

1. no two different internal nodes have the same label and the same high and 
low successors, 

2. no internal node has identical high and low successor. 

Theorem 2. [Bry86] Eor any Boolean function f ^ IB and a given variable 
ordering, there is (up to isomorphism) exactly one ROBDD d s.t. fd = /* 
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Fig. 3. Nondeterministic BDD (left) and three embedded deterministic BDDs. 
Solid edges are h/^rh-snccessors, dotted edges /ow;-snccessors. 



5 Nondeterministic Binary Decision Diagrams 

For the analysis of alternating antomata we need a more expressive representa- 
tion than BDDs. Sets of sets of states as they occnr, for example, in the initial 
condition or as sets of confignrations, cannot be represented as a conventional 
BDD. The extension we present in this section nses nondeterministic BDDs to 
represent sets of Boolean fnnctions. We interpret the nondeterministic BDD to 
describe the set of all deterministic BDDs that can be embedded in it. 

Example 3, Fignre 3 shows a nondeterministic BDD and the three embedded 
deterministic BDDs. 

Nondeterministic BDDs may have more than one root node, and the ont- 
degree of internal nodes may be higher than two, so we consider the sets of High 
and Low departing edges. 

Definition 5. A nondeterministic binary decision diagram (NBDD) 
(V, Qj ^ 0 , ^ 1 , a directed acyclic graph with internal nodes Q, edges 

Eq U Ei^ a set of root nodes C Q? terminal nodes 0, 1. The set of 

departing edges from an internal node n ^ Q is partitioned into Low{n) C Eq 
and High{n) C Ei, Every internal node n E Q is labeled with a variable 
var{n) G V, 

A NBDD D with root set defines a set of Boolean functions Ed — C 
2B-^b follows: 

— the terminal node 1 defines the set E\ — {true}, 

— the terminal node 0 defines the set Eq — {false}, 

— a set of nodes E defines the union of the sets represented by the individual 
nodes: JFz. 

— for an internal node n G Q, let 7i, E denote the sets defined by its High and 
Low successors, respectively. Then n defines the set: 

r (if varfn) then /i else /o) 1 
\ s.t. /o G T and fi eH) 



E = 
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BDDs are therefore a special (deterministic) case of NBDDs: for a given 
BDD (V, Q, Eq, El, (f)) the NBDD (V, Q, Eq, Ei, {</>}) characterizes the singleton 
set containing the Boolean function defined by the BDD. 

Definition 6. A BDD d = (V, , Eq^ , Ef, <f)^) is embedded m an NBDD D = 

{V , , Eq^ , E^ ,0^ ) iff there is a simulation function 7 : with 

7(0) = 0,7(1) = 1, G and for all nodes n E , var{n) = var{j{n)), 

if ff IS the low^ -successor of n then j{ff) is a Low^ -successor o/ 7 (n), if ff is 
the high^ -successor of n then j{ ff) is a High^ -successor o/ 7 (n). 

We say that two node sets , ^2 in an NBDD are mutually exclusive iff there 
is no BDD that is embedded in both the NBDD with root node set and the 
NBDD with root node set ^2- The notions of ordered and reduced diagrams can 
now be lifted to NBDDs: 

Definition 7. A NBDD is ordered (ONBDD), if on all paths through the graph 
the labeling respects a given linear order on the variables vi > V 2 > . . . > v^. An 
ONBDD IS reduced (RONBDD) if 

1. no two different internal nodes have the same label and the same High and 
Low successor sets, 

2, the High and Low successor sets of an internal node are mutually exclusive. 



Theorem 3. Let d be a OBDD and D a ONBDD with the same variable order, 
d IS embedded in D iff fd ^ Ed * 

Proof. By structural induction. □ 

RONBDDs are not a canonical representation of sets of Boolean func- 
tions. To achieve canonicity, more restrictions on the grouping of functions 
(if V then else ff) that have a common negative cofactor ff or a common 
positive cofactor ff are necessary. One such restriction, which we will call the 
negative-normal form, is to require that the functions are grouped by their neg- 
ative cofactors ff. 

Definitions. A RONBDD D = (V,Q, Eq, Ei,0) is in negative-normal form 
iff the following holds for all nodes n E Q: 

1. there is only one low -successor: Low{n) = {low{n)}, 

2. the low -successor is a BDD, 

3. no two different High-successors or root nodes are labeled by the same vari- 
able and have the same low -successor. 



Theorem 4. Eor any set of Boolean functions E E 2^ and a given variable 
ordering, there is (up to isomorphism) exactly one RONBDD D in negative- 
normal form s.t. Ed = E. 




Language Containment Checking with Nondeterministic BDDs 



31 



Proof. We show, by induction on m, that for any subset of the set of variables 
{i;i, . . . ,Vm} C {i;i, . . . ,Vn} (with variable order vi > V 2 > • • ' > Vn), any set of 
functions C (2® that only depend on variables in {i;i, . . .,Vm} can be 
canonically represented by a RONBDD in negative-normal form. In the following 
we will assume sharing of subgraphs, and identify NBDDs by their root node 
sets, BDDs by their root node. 

m = 0: There are four different sets of functions not depending on any 
variable: 0, {true}, {false}, {true, false}. These sets are uniquely represented by 
the RONBDDs with root node sets 0, { 0 }, {!}, { 0 , 1 }, respectively. 

m m-h 1: We construct the set of root nodes F for a set Fm+i , where 
is the least variable some function in Fm+i depends on. For each function / in 
Fm-\-i we consider the positive and negative cofactor f^{xi, . . . , • • • , ^n) = 

f{xi, , Xn), 6 C IB [the (m -h l)st argument is replaced by b]. This allows 

us to separate the subset of functions A that do not depend on 
^ = { / I / e Fm+i and f° = }■ 

For all other functions we separate the positive and negative cofactor in the fol- 
lowing set of pairs: 

B = { if, f) \ f e Fm+i and f ^ f }. 

Next, we group the positive cofactors by the negative cofactors: 

C = {{f,X) \ 3g . if,g)eB, X = {g \ {f,g) eB} }. 

The resulting sets of positive cofactors contain only functions that do not de- 
pend on The same holds for the set of functions in A. By the induction 

hypothesis, we can therefore find negative-normal RONBDDs as follows: 

D = { {df , Dy ) \ df is the canonical ROBDD for /, 

Dy is the root node set of the canonical RONBDD 
for Y with (/, T) G O }, 

E — the root node set of the canonical RONBDD for A. 

Finally, we can construct the set of root nodes for Fmyi- 

F = {{var = Vmyi , low = df. High = Dy) \ [dj, Dy) E D} U E. 

The constructed NBDD is ordered, reduced and in negative-normal form since 
the NBDDs in D and E are, and the newly constructed nodes maintain all 
conditions. It remains to show that the RONBDD is unique. 

Assume there was a different negative-normal RONBDD with root node set 
F^ defining Em+i- Consider the functions in Em+i that do not depend on Vm+i- 
since the Htgh and Low successors of any node must be mutually exclusive, 
they cannot be contained in the set represented by a node labeled by 
(reducedness). By the induction hypothesis we know that the set of all nodes in 
F that are not labeled by is canonical (the functions represented by the 

subset depend only on greater variables). Thus F and F^ must differ in nodes 
that are both labeled by 

Suppose there are two functions /i , /2 that are characterized by the same root 
node in one diagram but by two different root nodes in the other. All functions 
characterized by the same node in a ROBDD in negative-normal form have the 
same negative cofactor (conditions 1 and 2 and Theorem 2). Thus the diagram 
that represents them on two different nodes cannot be in negative-normal form 
(condition 3). □ 
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Union(7V, M) 

1 i? ^ (TV U M) n {0, 1} 

2 for all n ^ N 

3 if 3m G M . var{n) = ^;otr(m), low{n) = /o^^f(m) 

4 then i? i? U { (iJotr(n), llNION(i7*5f/7(n), i7*5f/7(m))) } 

5 else R ^ RU {n} 

6 for all m ^ M 

7 if a n ^ N . var(n) = ^;otr(m), /o^^;(r^) = /o^^'(m) 

8 then i? i? U {m} 

9 return R 



Fig. 4. Operation Union, computing the union of two sets represented by 
negative-normal NBDDs. 



It is straightforward to implement traditional BDD operations (like the ap- 
plication of boolean operations, variable substitution, or quantification) and set 
operations on NBDDs. As an example, consider Union, shown in Figure 4. 
We assume sharing of subgraphs and identify BDDs with their root nodes and 
NBDDs with their root node sets. The Union operation computes the negative- 
normal RONBDD representing the union of two sets represented by two negative- 
normal RONBDDs. This is done by considering corresponding nodes in the two 
root node sets. Two nodes correspond if they are labeled with the same variable 
and have the same /ow;-successor. The union is computed by recursing on pairs of 
corresponding nodes and simply adding nodes that do not have a corresponding 
node in the other set. 

6 Emptiness of Alternating Automata 

As discussed in Section 3, the language containment problem between non- 
deterministic Biichi automata is easily reduced to the emptiness problem of 
alternating automata. In this section we develop a fixpoint algorithm for the 
emptiness problem. The reachable configurations of an alternating automaton 
can be computed in a forward propagation from 0. To decide if the finite dag 
leading to such a configuration can be completed into an accepting run we 
identify gratifying segments, i.e., segments that would, if repeated infinitely 
often, form the suffix of an accepting run. 

Gratifying segments. Consider an alternating automaton A = 
{U,Q,0, p, a, l3). A run segment is a finite dag (TV, U), where the nodes are 
labeled with states state : N ^ Q, such that for each state i; in a configuration, 
the set of states on successor nodes is one of the successor sets in p{v,l) for 
some input letter I G U. We characterize gratifying segments w.r.t. a complete 
preorder N on the states in the source configuration. It will be helpful to iden- 
tify nodes that are on some path from a source node p to a target node p^, s.t. 
state{p) ^ state{p^)] we call such nodes fixed. A run segment S is gratifying if 
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1. the source and target configuration are the same, 

2. all fixed nodes are labeled by /^-states, 

3. all paths in S visit a node with an a-state, 

4. all paths originating from a source node labeled by a state p lead to nodes 
in the target slice that are labeled with states equivalent to, or smaller than 
p, and 

5. all paths originating from a source node labeled by a state p that visit a non- 
fixed node lead to target nodes that are labeled with states strictly smaller 
than p. 

Example The segment from slice 2 to slice 4 (configurations {a,L^,T}, 
{6,L^,5,(C}, {a,L^,T}) of the computation in Figure 2 is gratifying w.r.t. the 
preorder a ^ D ^ A. In slices 2 and 4 all nodes are fixed; in slice 3 the nodes 
labeled by 6, D and C are fixed. 

Lemma 1. Let L be a gratifying run segment of an alternating automaton^ and 
P a finite run prefix leading to the source slice of L, Then the dag G — P • ^ 

constructed by appending an infinite number of copies of L to P ^ is a computation 
of A. 

Proof, All paths in L visit an accepting state; the paths in therefore visit an 
accepting state infinitely often. A path that does not visit a fixed node in L leads 
to a target node that is labeled by a strictly smaller state than the state on the 
node it visited in the source slice. Thus, since there are only finitely many states 
in the source configuration of T, every path in eventually visits a fixed node. 
From there, a path can either (1) stay forever in fixed nodes (and therefore in 
stable states) or (2) visit a non-fixed node and, again, lead to a target node with 
a strictly smaller state. Hence, eventually (1) must occur. □ 

Lemma 2. Let G be a computation of the alternating automaton A, There is a 
preorder A s,t, G can be partitioned into a finite prefix P and an infinite number 
of copies of a segment L that is gratifying w,r,t, A, 

Proof For the given computation G we apply a ranking construction by Kupfer- 
man and Vardi [KV97]. Consider the following sequence of subgraphs of G. 

— Go — G. 

~ G 2 i+i = G 2 i minus all nodes from which there are only finitely many nodes 
reachable. Assign rank 2i to all the subtracted nodes. 

“ G 2 i -\-2 = G 2 i+i minus all nodes from which only nodes with /^-states are 
reachable. Assign rank 2i + 1 to all the subtracted nodes. 

G^ 2 |q|+i is empty [KV97], i.e., the number of ranks is bounded. There must be 
infinitely many occurrences of some configuration x, s.t. the nodes with the same 
state label have the same rank in the two occurrences. We select two occurrences 
s.t. all paths on the run segment T between them visit an <a-state and a node with 
odd rank. T is a gratifying segment with the order A induced by the ranking. 
The fixed states have odd rank, non-fixed states even rank. Along a path the 
rank never increases. □ 
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Annotated Configurations. To recognize gratifying segments we keep track 
of the gratification conditions in confignrations. An annotated configuration is a 
tnple {x, /, T u, A) where a? is a set of states, /, t, u are snbsets of x, and A is a 
complete preorder on x. The goal is to captnre the states on fixed nodes in /, 
“trapped” states (i.e., states on nodes s.t. all originating paths visit a fixed node) 
in t, and “fnlfilling” states (i.e., states on nodes s.t. all paths that originate from 
this node visit an o-node) in u. We now introdnce constraints that ensnre that 
these sets are propagated consistently in a seqnence of annotated confignrations. 
Consider two consecntive confignrations and /^, C, A^). We 

reqnire that there exists a letter I of the inpnt alphabet s.t. for each state v ^ x 
there is a set yy G p{vfi) so that the following constraints are satisfied: 

1. for V ^ X, yy C x' , 

2. for all E f there is a. v E f s.t. E yv , 

3. for all V E f, f C\yy ^ 

4. for all V Et — yy C C, 

5. for all V E u — a, yy C u\ 

6. for all E f and v E x, s.t. G there is a le G / s.t. E yw and w ^ v, 

7. for all E f and all w' E x' with w' ^ there exists a v E f s.t. E yv 

and for all w E x with E yw , ^ ^ uj, and 

8. for all i; G / s.t. there is a w E x with w ^ v, there exists a E f with 

E yv s.t. for all E yw, • 

Let y be a set of annotated confignrations. We say that an annotated config- 
nration a is eventually accepting w.r.t. Y iff there is a seqnence of annotated 
confignrations, where a is the first and some b E Y the last confignration, 
and where every two consecntive confignrations satisfy the constraints above. 
Let Eventual Accept (y) denote the set of annotated confignrations that are 
eventnally accepting w.r.t. Y . 

Lemma 3. Let S be a gratifying segment leading from a configuration x back to 
x; then there is an annotation for the source configuration a = (t, /, t = t, n = 
T, A) and an annotation for the target configuration Y — fx^ — x^f^ — /, C = 
f^Y = T n u, y) s.L for every set Y of annotated configurations that includes 
Y, a E EvENTUALAccEPT(y). 

Proof, First, we constrnct a segment in which every path visits a fixed node 
(by appending as many copies of S as needed). For each slice s in S' we define 
the following annotated confignration {xg, fsfis,Us,Ys): 

— Xg contains the states on nodes in s, 

— fg contains exactly the states on the fixed nodes in 5, 

— tg contains exactly the states on those nodes for which all paths that originate 
from the node visit a fixed node, 

— Ug contains exactly the states on those nodes for which all paths that origi- 
nate from the node visit an a-node. 
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— is the following preorder: 

for two states v,w on nodes p, q that are both in fg or both in Xg — fg, 

V -ig w if[ there is a target node q' reachable from q s.t. for all target nodes 
y reachable from p, state{p^) ^ state{q^)\ 

for two states v ^ fg,w ^ Xg — fg ov v ^ Xg — fg,w ^ fg on nodes p,q, 

V -ig w if[ there is a target node q^ reachable from q s.t. for all target nodes 
p' reachable from p, state{p^) ^ state{q^). 

The resnlting seqnence of annotated confignrations satisfies the constraints. Dne 
to space limitations we skip the detailed argnment here. □ 

Let Unmark (X) denote the set of annotated confignrations, s.t. {x, f, f,x f] 
a, C Unmark(X) for {x^ /, n, C X. Let Filter(X) be the snbset of the 
set of annotated confignrations X s.t. u — x^t — x. 

Lemma 4. Let a — {x, be an annotated configuration in a set Y s,t, 

a e FlLTER(EvENTUALAcCEPx(UNMARK(y))). Then there is a gratifying seg- 
ment S that leads from configuration x to x. 

Proof, Becanse of constraint (1) there is a rnn segment S corresponding to the 
seqnence of confignrations in the constrnction of Eventual Accept. We show 
that S is gratifying. For a slice s, let {xg, fgfig^Ug, Yg) denote the corresponding 
annotated confignration. 

Claim 1: For two nodes p^q in the same slice s, if state{q) ^g state{p), 
state [p) E fg then there is a path from p to a node p^ in the target slice labeled 
by an /-state, s.t. for all nodes q^ in the target slice that can be reached from q, 
state[q^) ^ state{p^). 

Proof by indnction on the length of S nsing constraint (8). 

Claim 2: For two nodes p^,q^ in the same slice 5, if state[p') ^g state{q'), 
state [p^) E fs then there is a path from a sonrce node p, with state (p) E /, to 
p\ s.t. for all nodes q in the sonrce slice that can reach q\ state{p) ^ state{q). 
Proof by indnction on the length of S nsing constraint (7). 

Claim 3: For all nodes p^ in the target slice that are reachable from a source 
node p: state(p^) A state{p). 

Proof: Case (A): state{p) E /. Assume there is a path from p to a node p' 
in the target slice with state{p) P state{fi). Let be the node in the target 
s.t. statefifi) — state{p). By Claim 2, there is a path from a node q in the 
source slice with state{q) E f to q' with state{q) ^ state{p'), state{q) E f. 
Hence, state{q) ^ state{p) — statefifi). Let o^ be the node in the target slice 
s.t. state{o') — state{q). Again, using Claim 2, we can find a node in the source 
slice with an /-state that is smaller than state{q). Since this argument can be 
repeated infinitely often the source configuration must contain infinitely many 
different states. 

Case (B): state{p) ^ /. Let s^ be the first slice with a fgf-node pi on the path 
from p to p^, and s the slice with the non- fg predecessor po of pi. By constraint 
(6) there must be a /^ -predecessor pg ofpi, s.t. state(pQ) ^ state{pQ). By Claim 
2, there is a source node q with state{q) E f and state{q) ^ state{p). By case 
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(A) all target nodes that are reachable from q are labeled by states smaller than 
or eqnivalent to state{q). In particnlar, state{p^) A state{q) ^ state{p). 

Claim 4: For a node in some slice s with state{p') G fs there is a path from 
a sonrce node p to a target node p^^ with state{p) ^ state{p^') and state{p) G 
/, state{p'') G / that visits p' . 

Proof: An indnction on the size of the segment of S' up to 5 using constraint 
(2) and a second induction on the size of the segment beginning with s using 
constraint (3) shows that there is indeed a source node p ^ f and a target node 
p^^ G / s.t. p' is on a path between them. By Claim 3, state{p^^) A state{p). 
Now assume state{p^^) ^ state{p). By Claim 2, there is a source node ^ G / s.t. 
state{q) ^ state{q). Let o be the node in the target slice labeled by state{q). 
Again, using Claim 2, we can find a node in the source slice with an /-state 
smaller than state{q). Since this argument can be repeated infinitely often the 
source configuration must contain infinitely many different states. 

Claim 5: If there is a path from a source node p to a target node p^^ with 
state{p) ^ state{p^^), then for all nodes p^ on the path (where p^ is a node in slice 
5 ), state{p') G fs- 

Proof: Since all states in the source slice are contained in t, we know (because 
of constraint 4) that every path in S visits at least one /^/-node in some slice sF 
Consider the case that state{p) ^ /. Now let be the first slice with a /^/-node 
Pi that is visited on the path from p to p^F Let s be the previous slice containing 
po, the non- fs predecessor of pi. By constraint (6), there is a node qo in s, s.t. pi 
is a successor of qo, state{qo) G fs and state{qo) state{po). By Claim 2, there 
is a source node q s.t. state{q) ^ state{p) and there is a path from q to qo. Since 
p^^ is reachable from q, by Claim 3, state (p^^) A state{q). This is in contradiction 
to state{p) ^ state{p^^). 

Now consider the case that state{p) G /. Let be the first slice with a non- 
fs f-node Pi that is visited on the path from p to p^F Let s be the previous slice 
containing Po, the -predecessor of pi. By constraint (8) there is a node qi in 
s', s.t. Po is a predecessor of pi, state(qi) G fs>^ and state{pi) Ps' state[qi). By 
Claim 1, there is a target node q" s.t. state{m") P state{q"), and there is a path 
from qi to q" . Since q" is reachable from p, by Claim 3 state[q") A state[p). 
This again is in contradiction to state[p) ^ state{p"). 

Proof of the lemma: By Claims 4 and 5, the fixed nodes are exactly the nodes 
labeled by /^-states. Because oi u — x and constraint 5 all paths in S visit an 
a-node. By Claim 3, all paths lead to smaller or equivalent states in the target. 
Paths that visit a non-fixed node lead to target nodes with strictly smaller states 
by Claim 5. □ 

With these results we can now formulate the algorithm for the emptiness 
check of alternating automata, shown in Figure 5. Let REACHABLE (A) denote 
the set of reachable configurations. Annotate (A) computes for a set of con- 
figurations A a set of annotated configurations, s.t. for a configuration x all 
annotations {x, f, f, x f] a, are added where / C x Pi /?. We state the correct- 
ness of the algorithm as the following two theorems. 

Theorem 5. If C{A) = 0 then Empty (A). 
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Empty(^) 

1 A ^ 0 

2 B ^ Annotate(Reachable(A)) 

3 while {A / B) do 

4 A ^ B 

5 B ^ B n Filter(EventualAccept(Unmark(R))) 

6 return (B = 0) 



Fig. 5. Fixpoint algorithm for the emptiness check of alternating antomata. 



Proof, Snppose there is an annotated confignration C B. By 

Lemma 4 there exists a gratifying segment L leading from confignration x to 
X. Since x E Reachable (Al) there is a rnn segment P leading from an initial 
confignration to x, Thns, by Lemma 1, A has a compntation P • . □ 

Theorem 6. // Empty(AI) then C{A) — 0. 

Proof, Snppose there is a compntation G of A. By Lemma 2, G can be par- 
titioned into an initial segment P and an infinitely often repeated gratifying 
segment L. Let x be the sonrce confignration of L. x E REACHABLE (Al). By 
Lemma 3 there is an annotated confignration a = {x, f,t = x,u = x, A) that is 
inclnded in Eventual Accept(Y), if a' = {x,f,P = f,u' = x f] a, A) E Y . 
Since a E Annotate (REACHABLE (Al)), Y E IJNMARK(y) if a E Y, and 
a E ElLTER(y) if a C y, n is inclnded in every iteration of B. □ 

7 Conclusions 

The data strnctnres and algorithms presented in this paper are the basis of a 
symbolic verification system for langnage containment. In comparison to the 
classic constrnction, that starts with the determinization of the specification an- 
tomaton, onr algorithm is both simpler and, for certain problems, more efficient: 
because the two automata are combined early, no effort is wasted on the deter- 
minization of parts of the specification automaton that are not reachable in the 
intersection with the implementation automaton. 

It should be noted, however, that our solution does not improve on the 
worst-case complexity of the standard algorithm. While first results with our 
prototype implementation are encouraging, advanced implementations and case 
studies are necessary to determine the characteristics of systems for which the 
symbolic approach is useful. The performance of NBDDs depends strongly on 
implementation issues like the constraints of the chosen normal form. 

Efficient representations of sets of Boolean functions are of interest beyond 
the language containment problem. An example is the state minimization of 
incompletely specified finite state machines [KVBSV94]: the standard algorithm 
computes sets of sets of (compatible) states. 
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Abstract. In this paper we present an algorithm for determining satish- 
ability of general Boolean formulas which are not necessarily on conjunc- 
tive normal form. The algorithm extends the well-known Davis-Putnam 
algorithm to work on Boolean formulas represented using Boolean Ex- 
pression Diagrams (BEDs). The BED data structure allows the algo- 
rithm to take advantage of the built-in reduction rules and the sharing 
of sub-formulas. Eurthermore, it is possible to combine the algorithm 
with traditional BDD construction (using Bryant’s ApPLY-procedure). 
By adjusting a single parameter to the Bed S AT algorithm it is possible 
to control to what extent the algorithm behaves like the ApPLY-algorithm 
or like a SAT- solver. Thus the algorithm can be seen as bridging the gap 
between standard SAT-solvers and BDDs. We present promising experi- 
mental results for 566 non-clausal formulas obtained from the multi-level 
combinational circuits in the ISC AS ’85 benchmark suite and from per- 
forming model checking of a shift- and- add multiplier. 



1 Introduction 

In this paper we address the problem of determining satisfiability of non-clansal 
Boolean formnlas, i.e., formulas which are not necessarily on conjunctive normal 
form. One area where such formulas arise is in formal verification. For example, 
in equivalence checking of combinational circuits we connect the outputs of the 
circuits with exclusive-or gates and construct a Boolean formulas for the com- 
bined circuits. The formulas is satisfiable if the two circuits are not functionally 
equivalent. 

Another important area in which non-clausal formulas arise is in model check- 
ing [1,4,5,6,24]. In bounded model checking, the reachable state space is ap- 
proximated by (syntactically) unfolding the transition relation and obtaining a 
propositional formula which is not in clausal form. In order to check whether the 
approximated state space R violates a given invariant /, one has to determine 
whether the formula -i/ A R is satisfiable. 

Boolean Expression Diagrams (BEDs) [2,3] is an extension of Binary Decision 
Diagram (BDD) [9] which allows Boolean operator vertices in the DAG. BEDs 
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can represent any Boolean formnlas in linear space at the price of being non- 
canonical. However, since converting a Boolean formnla into a BDD via a BED 
can always be done at least as efficiently as constrncting the BDD directly, many 
of the desirable properties of BDDs are maintained. 

Given a BED for a formnla, one way of proving satisfiability is to convert 
the BED to a BDD. The formnla is satisfiable if and only if the resnlting BDD 
is different from the terminal 0 (a contradiction). BDDs have become highly 
popnlar since they often are able to represent large formnlas compactly. How- 
ever, by converting the BED into a BDD, more information is obtained than 
jnst a “yes, the formnla is satisfiable” or a “no, the formnla is not satisfiable” 
answer. The resnlting BDD encodes all possible variable assignments satisfying 
the formnla. In some cases this extra information is not needed since we may 
only be interested in some satisfying assignment or simply in whether snch an 
assignment exists. The canonicity of BDDs also means that some formnlas (snch 
as the formulas for the multiplication function) cannot be efficiently represented 
and thus the approach to convert the BED to a BDD will be inefficient. 

Instead of converting the BED to a BDD, one can use a dedicated satisfia- 
bility solver such as Sato [25] or Grasp [17]. These tools are highly efficient in 
finding a satisfiable assignment if one exists. On the other hand, they are often 
much slower than the BDD construction when the formula is unsatisfiable. An- 
other problem with these algorithms is that the Boolean formula must be given 
in conjunctive normal form (CNE), and converting a general formula (whether 
represented as a BED or as a Boolean circuit) to CNE is inefficient: either k new 
variables are introduced (where k is the number of non-terminal vertices in the 
BED) or the size of the CNE may grow exponentially in the size of the formula. 

The Bed Sat algorithm presented in this paper attempts to exploit the ad- 
vantages of the two above approaches. The algorithm extends the Davis-Putnam 
algorithm to work directly on the BED data structure (thus avoiding the con- 
version to CNE). By using the BED representation, the algorithm can take 
advantage of the built-in reduction rules and the sharing of isomorphic sub- 
formulas. Eor small sub-BEDs (i.e., for small sub-formulas), it turns out that it 
is faster than running Davis-Putnam to simply construct the BDD and checking 
whether the result is different from 0. In the Bed Sat algorithm, this observa- 
tion is used by having the user provide an input N to the algorithm. When 
a sub-formula contains less than N BED vertices, the algorithm simply builds 
the BDD for the sub-formula and checks whether the result is different from 
the terminal 0. When using TV = 0, the Bed Sat algorithm reduces to an im- 
plementation of Davis-Putnam on the BED data structure (which is interesting 
in itself) and when using TV = oo, the Bed Sat algorithm reduces to Bryant A 
ApPLY-algorithm for constructing BDDs bottom-up. Experiments show that the 
Bed Sat algorithm is significantly faster than both pure BDD construction and 
the dedicated satisfiability-solvers Grasp and Sato, both on satisfiable and 
unsatisfiable formulas, when choosing a value of TV of 400 vertices. 
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Related Work 

Determining whether a Boolean formula is satisfiable is one of the classical NP- 
complete problems and algorithms for determining satisfiability have been stud- 
ied for a long time. The Davis-Putnam [11,12] SAT-procedure has been known for 
about 40 years and it is still considered one of the best procedures for determining 
satisfiability. More recently, incomplete algorithms like Greedy SAT (GSAT) [20] 
have appeared. These algorithms are faster than the complete methods, but by 
their very nature, they are not always able to complete with a definitive answer. 

Most SAT-solvers expect the input formula to be in CNF. However, Giun- 
chiglia and Sebastiani [13,19] have examined GSAT and Davis-Putnam for use 
on non-CNF formulas. Although these algorithm avoid the explicit conversion of 
the formula to CNF, they often implicitly add the same number of extra variable 
which would have been needed if one converted the formulate CNF. Stalmarck’s 
method [21] is another algorithm which does not need the conversion to CNF. 

BDDs [9] and variations thereof [10] have until recently been the dominating 
data structures in the area of formal verification. However, recently researchers 
have started studying the use of SAT-solvers as an alternative. Biere et aL [4,5,6] 
introduce bounded model checking where SAT-solvers are used to find counterex- 
amples of a given depth in the Kripke structures. Abdulla et al, [1] and Williams 
et al. [24] study SAT-solvers in fixed-point iterations for model checking. Bjesse 
and Claessen [7] apply SAT-solvers to van Eijk’s BDD-based method [22] for 
verification without state space traversal. 

2 Boolean Expression Diagrams 

A Boolean Expression Diagram [2,3] is a data structure for representing and ma- 
nipulating Boolean formulas. In this section we briefly review the data structure. 

Definition 1 (Boolean Expression Diagram). A Boolean Expression Dia- 
gram (BED) IS a directed acyclic graph G — {V, E) with vertex set U and edge 
set E. The vertex set U contains three types of vertices: terminal^ variable^ and 
operator vertices. 

— A terminal vertex v has as attribute a value val{v) G {0, 1}. 

— A variable vertex v has as attributes a Boolean variable var{v)^ and two 
children low (v), high (v) G U. 

— An operator vertex v has as attributes a binary Boolean operator op{v)^ and 
two children low{v)^ high{v) G U. 

The edge set E is defined by 

E — {(i;, low[v))^ (i;, high[v)) | G U and v is a non-terminal vertex } . 

The relation between a BED and the Boolean function it represents is straight- 
forward. Terminal vertices correspond to the constant functions 0 (false) and 1 
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(true). Variable vertices have the same semantics as vertices of HDDs and cor- 
respond to the if-then-else operator x ^ /i, /o defined as (a? A /i) V {~^x A /o). 
Operator vertices correspond to their respective Boolean connectives. This leads 
to the following correspondence between BEDs and Boolean functions: 

Definition 2. A vertex v in a BED denotes a Boolean function defined re- 
cursively as: 

— If V IS a terminal vertex, then f^ = val{v). 

— If V IS a variable vertex, then f^ = var{v) j^htgh{v) ^ jriow{v) 

— If V IS an operator vertex, then p — op{v) 

A BDD is simply a BED without operators, thus a strategy for convert- 
ing BEDs into BDDs is to gradually eliminate the operators, keeping all the 
intermediate BEDs functionally equivalent. There are two very different ways 
of eliminating operators, called Up_All and Up_One. The Up_All algorithm 
constructs the BDD in a bottom-up way similar to the Apply algorithm by 
Bryant [9]. 

The Up -One algorithm is unique to BEDs and is based on repeated use of 
the following identity (called the up-step): 

^ fljo) op {x ^ f[, fo) = X ^ (fi op f[), (fo op fo) , (1) 

where op is an arbitrary binary Boolean operator, a? is a Boolean variable, and 
fi and f- [i = 0,1) are arbitrary Boolean expressions. This identity is used to 
move the variable x above the operator op. In this way, it moves operators closer 
to the terminal vertices and if some of the expressions fi are terminal vertices, 
the operators are evaluated and the BED simplified. By repeatedly moving vari- 
able vertices above operator vertices, all operator vertices are eliminated and 
the BED is turned into a BDD. (Equation (1) also holds if the operator ver- 
tex op is a variable vertex. In that case, the up-step is identical to the level 
exchange operation typically used in BDDs to dynamically change the variable 
ordering [18].) 

The Up -One algorithm gradually converts a BED into a BDD by pulling up 
variables one by one. The main advantage of this algorithm is that it can exploit 
structural information in the expression. We refer the reader to [2,3,15,23] for a 
more detailed description of UP-One, UP-All and their applications. 

3 Satisfiability of Formulas in CNF 

A Boolean formula is in conjunctive normal form (on clausal form) if it is repre- 
sented as a conjunction (AND) of clauses, each of which is the disjunction (OR) 
of one or more literals. A literal is either a variable or the negation of a variable. 
The Davis-Putnam algorithm [11,12] (see Algorithm 1) determines whether a 
Boolean formula f in CNE is satisfiable. Line 1 is the base case where f is the 
empty set of clauses which represents “true.” Line 3 is the backtracking step 
where <f) contains an empty clause which represents “false.” Line 5 handles unit 
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Algorithm 1 The basic version of Davis-Putnam. The function asstgn{l^ (j)) 
applies the truth value of literal I to the CNF formula <f). The function choose- 
Uteral{(l)) selects a literal for DP to split on. 

Name: DP 0 

1: if 0 is the empty set of clauses then 
2: return true 

3: else if 0 contains the empty clause then 
4: return false 

5: else if a unit clause I occurs in (f> then 
6: return T)V{assign{l^(j))) 

7: else 

8: I ^ choose-literal{(p) 

9: return DF(assign(l,(f))) V DF(assign(^l,(f))) 



clauses, i.e., clauses of the form x ov In this case, the value of the variable in 
the unit clause I is assigned in all remaining clauses of using the assign [I, <p) 
procedure. Line 8 and 9 handles the general case where a literal is chosen and 
the algorithm splits on whether the literal is true or false. There are a number 
of different heuristics for choosing a “good” literal in line 8 and the SAT-solvers 
based on Davis-Putnam differ by how they choose the literals to split on. A 
simple heuristic is to choose the literal in such a way that the assignments in 
line 9 produce the most unit clauses. 

4 Satisfiability of Non-clausal Formulas 

Using BEDs, the effect of splitting on a literal is obtained by pulling a variable 
to the root using Up_One. After pulling a variable x up using Up_One, there 
are two situations: 

— The new root vertex contains the variable x. Both low and high children are 
BEDs. The formula is satisfiable if either the low child or the high child (or 
both) represents a satisfiable formula. 

— The new BED does not contain the variable x anywhere. The formula does 
not depend on x and we can pick a new variable to pull up. 

This suggests a recursive algorithm that pulls variables up one at a time. If the 
algorithm at any point reaches the terminal 1 , then a satisfying assignment has 
been found (the path from the root to the terminal 1 gives the assignment). The 
test for the empty set of clauses (line 1 in Algorithm 1) becomes a test for the 
terminal 1. The test for whether <f> contains the empty clause (line 3) becomes a 
test for the terminal 0. lUs not possible to test for unit clauses in the BED and 
thus lines 5 and 6 have no correspondence in the BED algorithm. The only use of 
the unit clause detection in the Davis-Putnam algorithm is to reduce the size of 
the CNF representation. However, the BED data structure has a large number of 
built-in reduction rules such as the distributive laws and the absorption laws [23]. 
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Algorithm 2 The Bed Sat algorithm. The argument u is a BED. The function 
choose-variahle{u) selects a variable to split on. 

Name: BedSat u 
1: if a = 1 then 
2: return true 

3: else if a = 0 then 
4: return false 

5: else 

6: X ^ choose-variable{u) 

7: a^ ^ Up_ONE(rr, a) 

8: if a As a variable x vertex then 

9: return BedSat low{u ) V BedSat high{u ) 

10: else 

11: return BedSat u' 



These reduction rules are applied each time a new BED vertex is created and 
can potentially reduce the size of the representation considerably. Algorithm 2 
shows the pseudo-code for the SAT-procedure Bed Sat. 

The function choose-variable in line 6 of Algorithm 2 selects a variable to 
split on. With a clausal representation of the formula, it is natural to pick the 
variable in such a way as to obtain the most unit clauses after the split. This 
gives the most reductions due to unit propagation. Although we do not have 
a clausal representation of the formulas when using BEDs, it is still posible to 
choose good candidate variables. In [23], several different heuristics for picking 
good variable orderings for Up_One are discussed. The first variable in such an 
ordering is probably a good variable to split on. In the prototype implementation, 
a simple strategy has been implemented: the first variable encountered during 
a depth-first search is used in the spilt. Notice that we do not need to split on 
the variables in the same order along different branches, i.e., it is not necessary 
to choose a single global variable ordering. Thus, the variable ordering can be 
adjusted to each sub-BED as the algorithm executes. 

In line 9 the algorithm branches out in two: one branch for the low child and 
one for the high child. If a satisfying assignment is found in one branch (and 
thus BedSat returns true), it is not necessary to consider the other branch. We 
have implemented a simple greedy strategy of first examining the branch with 
the smaller BED size (least number of vertices). 

An interesting feature of the Bed Sat algorithm is that it is possible to 
compute the fraction of the state space that has been examined at any point in 
the execution. It is known that the algorithm will terminate when 100% of the 
state space has been examined (it may of course terminate earlier if a satisfying 
assignment is found.) Eigure 1 shows graphically how to determine the fraction 
of the state space that has been examined by the Bed Sat algorithm. The circles 
correspond to splitting points and the triangles correspond to parts of the BED 
which have (gray triangles) or have not (white triangles) been examined. The 
numbers next to the triangles indicate the size of the state space represented by 
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Fig. 1. Illustration of how to determine the percentage of the state space that 
has been examined. Each circle represents a split on a variable. The top circle is 
the starting point. The triangles represent sub-BEDs; the white ones are as yet 
unexamined while the gray ones have already been examined. Assume that there 
are n variables in total and that the current position in Bed Sat corresponds to 
the bottom circle. Then the fraction of the state space which has already been 
examined is ^ . 



each triangle assuming that there are n variables in total. The fraction of the 
state space examined so far is determined by adding the numbers from the gray 
triangles and dividing by 2^ which is the size of the complete state space. 

Of course, the percentage of the state space that has been examined does 
not say much about the time remaining in the computation. However, it does 
allow us to detect whether the algorithm is making progress. One could imag- 
ine a SAT-solver which jumps back a number of splits if the user felt that the 
current choice of split variables did not produce any progress. This could also 
be done automatically by tracking how the percentage changes over time. No or 
little growth could indicate that the choosen sequence of variables to split on is 
inefficient and the algorithm should backtrack and pick new split variables. Such 
backtracking is called premature and the technique is used in many implementa- 
tions of Davis-Putnam. It works the best for satisfiable functions since it allows 
the search to give up on a particular part of the state space and concentrate on 
other, and hopefully easier, parts. If a satisfying assignment is found in the easy 
part of the state space, then the difficult part never needs to be revisited. For 
unsatisfiable functions, the entire state space needs to be examined, and giving 
up on one part just postpones the problems. The only hope is that by choosing 
a different sequence of variables to split on, the BED reduction rules collapse 
the difficult part of the state space. 

The Bed Sat algorithm can be improved by combining it with traditional 
BDD construction. As more and more splits are performed, more and more 
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Algorithm 3 The Bed Sat algorithm with cutoff size N. |u| is the number of 
vertices in the BED u. Line 6 returns whether the BED u represents a satisfiable 

function. 

Name: BedSat u 
1: if a = 1 then 
2: return true 

3: else if a = 0 then 
4: return false 

5: else if |a| < N then 
6: return (lJp_ALL(a) / 0) 

7: else 

8 : X ^ choose-variable{u) 

9: a^ ^ Up_ONE(rr, a) 

10: if a As a variable x vertex then 

11: return BedSat low{u ) V BedSat high{u ) 

12: else 

13: return BedSat u 



variables are assigned a value and the remaining BED shrinks. The Bed Sat 
algorithm, as described above, continues this process until either reaching a 
terminal 1, or the entire BED is reduced to 0 (i.e., the BED is unsatisfiable) . 
However, at some point it becomes more efficient to convert the BED into a 
BDD from which it can be decided immediately whether the original formula is 
satisfiable (the BDD is not 0) or the algorithm has to backtrack and continue 
spliting (the BDD is 0). 

As discussed in the introduction, there is a trade-off between building the 
BDD and splitting on variables. The BDD construction computes too much 
information and is slow for large BEDs. On the other hand, splitting on variables 
tend to be slow when the depth is large. To be able to find the optimal point 
between the BDD construction and splitting on variables, we use a cutoff size 
N for the remaining BED; see Algorithm 3. If the size of the BED of a sub- 
problem less than the cutoff size, the BED is converted into a BDD, otherwise 
we continue splitting. Eor large values of N, the revised version of BedSat 
reduces to pure BDD construction. Eor N equal to 0, the revised algorithm is 
identical to Algorithm 2. 

5 Experimental Results 

To see how well Bed Sat works in practice, we compare it to other techniques 
for solving satisfiability problems. Unfortunately, the standard benchmarks used 
to evaluate SAT-solvers are all in CNE (see for example [14]). To compare the 
performance of Bed Sat with existing algorithm on non-clausal Boolean formu- 
las, we obtain Boolean formulas from the circuits in the ISCASA5 benchmark 
siute [8] and from model checking [24] . 

We compare Bed Sat to the BDD construction algorithms Up_One and 
Up -All, both using the Eanin variabel ordering heuristic. Eurthermore, we 
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compare Bed Sat with the state-of-the-art SAT-solvers Sato and Grasp. Since 
both Sato and Grasp require their input to be in CNF form, we convert the 
BEDs to CNF although this increases the number of variables and thus also the 
state space for Sato and Grasp. 

The experiments are performed on a 450 MHz Pentium III PC running Linux. 
For the BDD construction in Algorithm 3 we use Up_All with the Fanin vari- 
able ordering heuristic [23]. All runs are limited to 32 MB of memory and 15 
minutes of CPU time. 

Table 1 shows the ISC AS ’85 results. The ISC AS ’85 benchmark consists of 
eleven multi-level combinational circuits, nine of which exist both in a redundant 
and a non-redundant version. Furthermore, the benchmark contains five circuits 
that originally were believed to be non-redundant versions but it turned out 
that they contained errors and weren’t functionally equivalent to the original 
circuits [16]. The nine equivalent pairs of circuits corresponds to 475 unsatisfiable 
Boolean formulas (the circuits have several outputs) when the outputs of the 
circuits are pairwise exclusive-or’ed. The first nine rows of Table 1 show the 
runtimes to prove that the 475 fomulas are unsatisfiable using Up_One, Up_All, 
Sato, Grasp, and the Bed Sat algorithm with the cutoff size equal to 0, 100, 400 
and 1000. Up_One and Up_All perform quite well on all nine circuits. The SAT- 
solvers Sato and Grasp perform well on the smaller circuits (the number in the 
circuit names indicate the size), but give up on several of the larger ones. With 0 
as cutoff size. Bed Sat does not perform well at all. The runtimes are an order of 
magnitude larger than the runtimes for the other methods. The long runtimes are 
due to Bed Sat’s poor performance on some (but not all) unsatisfiable formulas. 
Increasing the cutoff size to 100 or 400 improves Bed Sat ’s performance. In fact, 
with cutoff size 400, Bed Sat yields runtimes comparable to or better than all 
other methods except the case of c6288/nr with Up_One. 

The last five rows of Table 1 show the results for the erroneous circuits. 
Here there are 340 Boolean formulas in total out of which 267 are unsatisfiable 
and 73 are satisfiable. We indicate this with “S/U” in the second column. The 
Up -One and Up_All methods take slightly longer on the erroneous circuits 
since not all BDDs collapse to a terminal. The SAT-solvers (Sato, Grasp and 
Bed Sat) perform considerably better on the erroneous circuits compared to 
the correct circuits; sometimes going from impossible to possible as for Sato 
and BedSat (with a cutoff size of 0 and 100) on c7552. BedSat is the only 
SAT-solver to handle c3540 and it outperforms Sato and Grasp on c5315. On 
c7552, BedSat is two orders of magnitude slower with a cutoff of 0, but yields 
comparable results when the cutoff size increases. 

Consider the case of Bed Sat on c3540 with a cutoff size of 0. In the correct 
version of the circuits. Bed Sat uses 185 seconds. This number reduces to 35.9 
seconds for the erroneous circuits. The c3540 circuit has 22 outputs where five 
are faulty in the erroneous version. Bed Sat has no problem detecting the errors 
in the five faulty outputs. In the correct version, about 149 seconds are spent 
on proving those five outputs to be unsatisfiable. Another example is cl908 
where, in the correct case. Bed Sat spends all the time (242 seconds) on one 
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Table 1. Runtimes in seconds for determining satisfiability of problems arising 
in verification of the ISC AS ’85 benchmarks using different approaches. In the 
“Result” column, “U” indicates unsatisfiable problems while “S/U” indicates 
both satisfiable and unsatisfiable problems. Both Up_One and Up_All use the 
Fanin variable ordering heuristic. The last three columns show the results for 
Bed Sat. The numbers 0, 100, 400 and 1000 indicate the cutoff sizes in number 
of vertices. A dash indicates that the computation could not be done within the 



resource limits. 
Description 


Result Up. 


_One Up_All Sato Grasp 


0 


Bed Sat 
100 400 


1000 


c432/nr 


U 


2.1 


1.7 


0.5 


0.4 


36.4 


3.5 


1.4 


1.4 


c499/nr 


U 


4.3 


1.8 


1.8 


1.4 


17.8 


16.7 


1.7 


1.7 


cl355/nr 


u 


4.3 


1.8 


1.8 


1.5 


18.1 


16.5 


1.7 


1.7 


cl908/nr 


u 


0.7 


0.6 


0.4 


0.4 


242 


11.1 


0.2 


0.2 


c2670/nr 


u 


1.2 


0.6 


1.0 


0.9 


38.6 


1.9 


0.3 


0.3 


c3540/nr 


u 


32.3 


39.2 


- 


- 


185 


133 


10.9 


16.3 


c5315/nr 


u 


16.2 


1.9 


- 


15.0 


1.1 


1.0 


0.9 


1.3 


c6288/nr 


u 


2.7 


- 


- 


- 


- 


- 


- 


- 


c7552/nr 


u 


3.6 


1.1 


- 


4.4 


- 


- 


0.7 


0.7 


cl908/nr— err 


S/U 


0.7 


0.6 


0.4 


0.4 


0.1 


0.1 


0.2 


0.2 


c2670/nr— err 


S/U 


2.9 


0.7 


0.9 


0.8 


0.4 


0.3 


0.3 


0.3 


c3540/nr— err 


S/U 


42.8 


40.2 


- 


- 


35.9 


15.8 


4.6 


6.5 


c5315/nr— err 


S/U 


32.7 


2.4 


31.7 


10.3 


0.7 


1.6 


1.5 


1.8 


c7552/nr— err 


S/U 


8.1 


1.8 


2.5 


2.6 


176 


2.0 


1.3 


1.3 



unsatisfiable output. In the erroneous version the difficult output has an error 
and the corresponding Boolean formula becomes satisfiable. Bed Sat finds a 
satisfying assignment instantaneously (0.1 seconds). 

By varying the cutoff size, we can control whether Bed Sat works mostly as 
the standard BDD construction (when using a high cutoff size) or as a Davis- 
Putnam SAT-solver (a low cutoff size). From Table 1 it is observed that for the 
ISCAS circuits, a cutoff size of 400 seems to be the optimal value. Using this 
value for the cutoff size, the Bed Sat algorithm outperforms both pure BDD 
construction (Up_All and Up_One) and standard SAT-solvers (Sato, Grasp, 
and Bed Sat with 0 cutoff) for all the larger circuits except c6288/nr and for 
all the errorneous (and thus satisfiable) ISCAS circuits. 

The Boolean formulas obtained from the ISCAS ’85 circuits have many iden- 
tical sub-formulas since they are obtained by comparing two similar circuits. 
To test the Bed Sat algorithm on a more reaslistic example (at least from the 
point of view of formal verification) , we have extracted Boolean formulas that 
arise during the fixed-point iteration when performing model checking of a 16-bit 
shift-and-add multiplier [23]. Table 2 shows the results for the model checking 
problems. The numbers 10, 20 and 30 indicate the output bit we are considering. 
The word “final” indicates the satisfiability problem for the check for whether the 
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Table 2. Runtimes in seconds for determining satisfiability of problems arising in 
model checking using different approaches. In the “Result” column, “U” indicates 
unsatisfiable problems while “S” indicates satisfiable problems. Both Up_One 
and Up -All use the Fanin variable ordering heuristic. The Bed Sat experiments 
have cutoff size 0 (i.e., no cutoff) except for the one marked with | which has 
cutoff size 400. A dash indicates that the computation could not be done within 
the resource limits. 



Description 



Result Up_One Up_All Sato Grasp Bed Sat 



mult_10 Jinal 


U 


13.1 


43.5 


- 


- 


31.9^ 


mult_10_last Jp 


U 


10.3 


- 


0.1 


0.1 


0.2 


mult_10_secondj.ast jfp 


s 


- 


- 


0.1 


0.1 


0.1 


mult_20jf inal 


? 


- 


- 


- 


- 


- 


mult_20_last jfp 


u 


- 


- 


0.1 


0.1 


0.1 


mult_20_secondj.ast jfp 


s 


- 


- 


0.5 


40.9 


0.5 


mult_30_f inal 


s 


- 


- 


0.3 


0.6 


0.2 


mult -3 0-1 as tTp 


u 


- 


- 


0.1 


0.2 


0.2 


mult_30_secondJ.astTp 


s 


- 


- 


0.6 


1.4 


0.5 


mult _bug_l 0_f inal 


s 


13.0 


- 


6.7 


0.1 


0.1 


mult_bug_10_lastTp 


u 


9.9 


- 


0.1 


0.1 


0.2 


mult_bug_10_secondJ.ast Jp 


s 


- 


- 


0.1 


0.1 


0.1 


mult_bug_20 Jinal 


s 


- 


- 


113 


- 


0.3 


mult_bug_20_last Jp 


u 


- 


- 


0.1 


0.1 


0.1 


mult_bug_20_secondJ.ast Jp 


s 


- 


- 


0.5 


499 


0.5 


mult_bug_30 Jinal 


s 


- 


- 


0.3 


0.6 


0.2 


mult_bug_30_last Jp 


u 


- 


- 


0.1 


0.2 


0.2 


mult_bug_30_secondJ.ast Jp 


s 


- 


- 


0.6 


1.5 


0.5 



implementation satisfies the specification. The word “last_fp” indicates the sat- 
isfiability problem for the last iteration in the fixed-point computation (where it 
is detected that the fixed-point is reached). The word “second_last_fp” indicates 
the satisfiability problem for the previous iteration in the fixed-point iteration. 
The result column indicates whether the satisfiability problem is satisfiable (S) 
or not (U). 

For the model checking problems, Up -One and Up -All perform very poorly. 
Up -One is only able to handle four out of 18 problems and Up -All only handles 
a single one. However, both Up -One and Up -All handle the mult-lOxf inal 
problem which is difficult for the SAT-solvers. The SAT-solvers perform quite 
well - both on the satisfiable and the unsatisfiable problems. Most of the prob- 
lems are solved in less than a second by all three SAT-solvers. While both Sato 
and Grasp take a long time on a few of the problems. Bed Sat is more consistent 
in its performance. 

6 Conclusion 

This paper has presented the Bed Sat algorithm for solving the satisfiability 
problem on BEDs. The algorithm adopts the Davis-Putnam algorithm to the 
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BED data structure. Traditional SAT-solvers require the Boolean formula to 
be given in CNF, but Bed Sat works directly on the BED and thus avoids the 
conversion of the formula to CNF which either adds extra variables or may result 
in an exponentially larger CNF formula. The Bed Sat algorithm is also able to 
take advantage of the BED data structure by using the reduction rules from [23] 
during the algorithm and by taking advantage of the sharing of sub-formulas. 

We have described how the Bed Sat algorithm is combined with traditional 
BDD construction. By adjusting a single parameter to the Bed Sat algorithm 
it is possible to control to what extent the algorithm behaves like the Apply- 
algorithm or like a SAT-solver. Thus the algorithm can be seen as bridging the 
gap between standard SAT-solvers and BDDs. 

We present promising experimental results for 566 non-clausal formulas ob- 
tained from the multi-level combinational circuits in the ISC AS ’85 benchmark 
suite and from performing bounded model checking of a shift-and-add multiplier. 
For these formulas, the Bed Sat algorithm is more efficient than both pure SAT- 
solvers (Sato and Grasp) and standard BDD construction. The combination 
works especially well on formulas which are unsatishable and thus difficult for 
pure SAT-solvers. 
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Abstract. In this paper, we present the design and the implementa- 
tion of a composite model checking library. Our tool combines different 
symbolic representations, such as BDDs for representing boolean logic 
formulas and polyhedral representations for linear arithmetic formulas, 
with a single interface. Based on this common interface, these data struc- 
tures are combined using what we call a composite representation. We 
used an object-oriented design to implement the composite symbolic li- 
brary. We imported CUDD (a BDD library) and Omega Library (a linear 
arithmetic constraint manipulator that uses polyhedral representations) 
to our tool by writing wrappers around them which conform to our sym- 
bolic representation interface. Our tool supports polymorphic verihcation 
procedures which dynamically select symbolic representations based on 
the input specihcation. Our symbolic representation library forms an 
interface between different symbolic libraries, model checkers, and speci- 
hcation languages. We expect our tool to be useful in integrating different 
tools and techniques for symbolic model checking, and in comparing their 
performance. 



1 Introduction 

In symbolic model checking sets of states and transitions are represented symbol- 
ically (implicitly) to avoid the state-space explosion problem [BCM"^90,McM93]. 
Success of symbolic model checking has been mainly due to efhciency of the data 
structures used to represent the state space. For example, binary decision dia- 
grams (BDDs) [Bry86] have been successfully used in verihcation of hnite-state 
systems which could not be verihed explicitly due to size of the state space 
[BCM"^90,McM93]. Linear arithmetic constraint representations have been used 
in verihcation of real-time systems, and inhnite-state systems 
[ACH+95,AHH96,BGP99,HRP94] which are not possible to verify using explicit 
representations. Any data structure that supports operations such as intersec- 
tion, union, complement, equivalence checking and existential quantiher elim- 
ination (used to implement relational image computations) can be used as a 
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symbolic representation in model checking. The motivation is to find symbolic 
representations which can represent the state space compactly to avoid state- 
space explosion problem. However, symbolic representations may have their own 
deficiencies. For example HDDs are incapable of representing infinite sets. On 
the other hand linear arithmetic constraint representations, which are capable 
of representing infinite sets, are expensive to manipnlate dne to increased ex- 
pressivity. 

Generally, model checking tools have been bnilt nsing a single symbolic rep- 
resentation [McM93,AHH96]. The representation nsed depends on the target 
application domain for the model checker. Inefficiencies of the symbolic repre- 
sentation nsed in a model checker can be addressed nsing varions abstraction 
techniqnes, some ad hoc, snch as restricting variables to finite domains, some 
formal, snch as predicate-abstraction [SaiOO]. These abstraction techniqnes can 
be nsed independent of the symbolic representation. As model checkers become 
more widely nsed, it is not hard to imagine that a nser would like to use a model 
checker built for real-time systems on a system with lots of boolean variables and 
only a couple of real variables. Similarly another user may want to use a BDD- 
based model checker to check a system with few boolean variables but lots of 
integer variables. Currently, such users may need to get a new model-checker for 
these instances, or use various abstraction techniques to solve a problem which 
may not be suitable for the symbolic representation their model checker is using. 
More importantly, as symbolic model-checkers are applied to larger problems, 
they are bound to encounter specifications with different variable types which 
may not be efficiently representable using a single symbolic representation. 

In this paper we present a verification tool which combines several symbolic 
representations instead of using a single symbolic representation. Different sym- 
bolic representations are combined using the composite model checking approach 
presented in [BGL98,BGL00b]. Each variable type in the input specification is 
assigned to the most efficient representation for that variable type. The goal is 
to have a platform where strength of each symbolic representation is utilized as 
much as possible, and deficiencies of a representation are compensated by the 
existence of other representations. 

We use an object oriented design for our tool. First we declare an interface 
for symbolic representations. This interface is specified as an abstract class. All 
symbolic representations are defined as classes derived from this interface. We 
integrated CUDD and Omega Library to our tool by writing wrappers around 
them which implements this interface. This makes it possible for our verifier to 
interact with these libraries using a single interface. The symbolic representations 
based on these tools form the basic representation types of our composite library. 
Our composite class is also derived from the abstract symbolic representation 
class. A composite representation consists of a disjunction of composite atoms 
where each composite atom is a conjunction of basic symbolic representations. 
Composite class manipulates this representation to compute operations such as 
union, intersection, complement, forward-image, backward-image, equivalence 
check, etc. 
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There have been other studies which use different symbolic representations 
together. In [CABN97], Chan et aL present a technique in which (both linear 
and non-linear) constraints are mapped to BDD variables (similar representa- 
tions were also used in [AB96,AG93]) and a constraint solver is used during 
model checking computations (in conjunction with SMV) to prune infeasible 
combinations of these constraints. Although this technique is capable of han- 
dling non-linear constraints, it is restricted to systems where transitions are 
either data-memoryless (i.e., next state value of a data variable does not depend 
on its current state value), or data- invariant (i.e., data variables remain un- 
changed). Hence, even a transition which increments a variable (i.e., — x -\-l) 

is ruled out. It is reported in [CABN97] that this restriction is partly motivated 
by the semantics of RSML, and it allows modeling of a significant portion of 
TCAS II system. 

In [BSOO], a tool for checking inductive invariants on SCR specifications is 
described. This tool combines automata based representations for linear arith- 
metic constraints with BDDs. This approach is similar to our approach but it is 
specialized for inductive invariant checking. Another difference is our tool uses 
polyhedral representations as opposed to automata based representations for lin- 
ear arithmetic. However, because of the modular design of our tool it should be 
easy to extend it with automata-based linear constraint representations. 

Symbolic Analysis Laboratory (SAL) is a recent attempt to develop a frame- 
work for combining different tools in verifying properties of concurrent systems 
[BGL+OOa]. The heart of the tool is a language for specifying concurrent sys- 
tems in a compositional manner. Our composite symbolic library is a low-level 
approach compared to SAL. We are combining different libraries at the sym- 
bolic representation level as opposed to developing a specification language to 
integrate different tools. 

The rest of the paper is organized as follows. We explain the design of our 
composite symbolic library in Section 2. In Section 3, we describe the algorithms 
for manipulating composite representations. Section 4 presents the polymorphic 
verification procedure. In Section 5 we show the performance of the composite 
model checker on a simple example. Finally, in Section 6 we conclude and give 
some future directions. 

2 Composite Symbolic Library 

To combine different symbolic representations we use the composite model check- 
ing approach presented in [BGL98,BGL00b]. The basic idea in composite model 
checking is to map each variable in the input specification to a symbolic repre- 
sentation type. For example, boolean and enumerated variables can be mapped 
to BDD representation, and integers can be mapped to an arithmetic constraint 
representation. Then, each atomic event in the input specification is conjunc- 
tively partitioned where each conjunct specifies the effect of the event on the 
variables represented by a single symbolic representation. For example, one con- 
junct specifies the effect of the event on variables encoded using BDDs, whereas 
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another conjunct specifies the effects of the event on variables encoded using lin- 
ear arithmetic constraints. We encode the sets of system states as a disjunction 
of conjunctively partitioned type specific representations (e.g., a disjunct may 
consist of a boolean formula stored as a BDD representing the states of boolean 
and enumerated variables, and a linear arithmetic constraint representation rep- 
resenting the states of integer variables). The forward and backward image com- 
putations are computed independently for each symbolic representation by ex- 
ploiting the conjunctive partitioning of the atomic events. We also implement 
algorithms for intersection, union, complement and equivalence checking compu- 
tations for the disjunctive composite representation that use the corresponding 
methods for different symbolic representations. The key observation here is the 
fact that conjunctive partitioning of the atomic events allows forward and back- 
ward image computations to distribute over different symbolic representations. 




Fig. 1. Architecture of the composite model checker 



Our current implementation of the composite symbolic library uses two sym- 
bolic representations: BDDs and polyhedral representation for Presburger arith- 
metic formulas. For the BDD representations we use the Colorado University 
Decision Diagram Package (CUDD) [CUD]. For the Presburger arithmetic for- 
mula manipulation we use the Omega Library [KMP'^95,Ome]. Fig. 1 illustrates 
a general picture of our composite model checking system. We will focus on the 
symbolic library and verifier parts of the system in this paper. 

We implemented our composite symbolic library in C-h+ and Fig. 2 shows 
its class hierarchy as a UML class diagram^. The abstract class Symbolic serves 
as an interface to all symbolic representations including the composite represen- 
tation. Our current specification language supports enumerated, boolean, and 
integer variables. Our system maps enumerated variables to boolean variables. 
The classes BoolSym and IntSym are the symbolic representations for boolean 
and integer variable types, respectively. Class BoolSym serves as a wrapper for 

^ In UML class diagrams, triangle arcs denote generalization, diamond arcs denote 
aggregation, dashed arcs denote dependency, and solid lines denote association among 
classes. 
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Fig. 2. Class diagram for the composite symbolic library 



the BDD library CUDD [CUD]. It is derived from the abstract class Symbolic. 
Similarly, IntSym is also derived from abstract class Symbolic and serves as a 
wrapper for the Omega Library [Ome]. 

The class CompSym is the class for composite representations. It is derived 
from Symbolic and uses IntSym and BoolSym (through the Symbolic interface) 
to manipulate composite representations. Note that this design is an instance of 
the composite design pattern given in [GHJV94]. 

To verify a system with our tool, one has to specify its initial condition, tran- 
sition relation, and state space using a set of composite formulas. The syntax of 
a composite formula is defined as follows: 

CF ::= CF A CF \ CF V CF \ -CF | FF | 7F 
BF ::= BF A BF \ BF V BF \ ->BF \ Termtooi 
IF ::= IF A IF \ IF V IF \ -> IF \ Termint Bop Termint 
Termtooi ::= idtooi \ true \ false 

Termint Term^t Aop Termmt \ —Term^t \ idmt \ constant 
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where CF , BF , IF , Rop, and Aop denote composite formnla, boolean formnla, 
integer formnla, relational operator, and arithmetic operator, respectively. Since 
symbolic representations in onr composite library cnrrently snpport only boolean 
and linear arithmetic formnlas, we restrict arithmetic operators to + and — (we 
actnally allow mnltiplication with a constant). In the fntnre, by adding new 
symbolic representations we can extend this grammar. 

A transition relation can be specified nsing a composite formnla by nsing 
nnprimed variables to denote cnrrent state variables and primed variables to 
denote next state variables. A method called registerVariables in BoolSym 
and IntSym is nsed to register cnrrent and next state variable names dnring the 
initialization of the representation. 

Given a composite formnla, the method constructFromSyntaxTree( ) in 
CompSym traverses the syntax tree and calls constructFromSyntaxTreeO method 
of BoolSym when a boolean formula is encountered and calls 
constructFromSyntaxTreeO method of IntSym when an integer formula is en- 
countered. In CompSym, a composite formula. A, is represented in Disjunctive 
Normal Form (DNF) as 

n t 

A = \/ /\ Qij 
i=l j=l 

where aij denotes the the formula of type j in the ith disjunct, and n and t 
denote the number of disjuncts and the number of types, respectively. 

Each disjunct is implemented as an instance of a class called compAtom 

(see Fig. 2). Each compAtom object represents a conjunction of formulas each of 
which is either a boolean or an integer formula. 

A composite formula stored in a CompSym object is implemented as a list of 
compAtom objects, which corresponds to the disjunction in the DNF form above. 
Figure 3 shows internal representation of the composite formula 

(a > 0 A F = a 1 A F) V (a < 0 A o! — a A F — h) 

in a CompSym object. The field atom is an array of pointer to class Symbolic and 
the size of the array is the number of basic symbolic representations. 

CompSym and compAtom classes use a TypeDescriptor class which records 
the variable types used in the input specification. Our library can adapt itself to 
any subset of the supported variable types, i.e., if a variable type is not present 
in the input specification, the symbolic library for that type will not be called 
during the execution. For example, given an input specification with no integer 
variables our tool will behave as a BDD-based model checker without making 
any calls to Omega Library. 

A Simplifier class implements a simplifier engine that reduces the number 
of disjuncts in the composite representations. Given a disjunctive formula A it 
searches for pairs of disjuncts that can be expressed as a 

single disjunct Aj-ibj. Two disjuncts Aj_;L^A can be simplified to 

a single disjunct Aj-^bj if one of the following holds: 

- A^j-^aij is subset of Then A^j-^bj = 
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: CompSym 

compositeRepresentation ; *LinkedList<compAtom> 



j LinkedListNode<compAtom> 



data : comp Atom 
I atom ; * Symbolic [] 



0 b’ 

1 a>0 A a’ = a + 1 



next ; LinkedListNode<compAtom> 



; LinkedListNode<compAtom> 



data : compAtom 



atom ; * Symbolic [] 



b’ =b 



a<=0 A a’ = a 



next ; LinkedListNode<compAtom> 






Fig. 3. An instance of CompSym class 



- is superset of Then 

— There exists j such that Gij is not equal to auj and for l<m</, 

aim is equal to aum - Then for \ < m <t^ m ^ j ^hm — ctim and bj = aijVakj. 

3 Algorithms for Manipulating Composite 
Representations 

In this section, we present the algorithms used in compAtom and CompSym classes 
to implement the methods of Symbolic interface such as intersection, union, 
complement, image computations, subset, equality and satisfiability checks. Note 
that the algorithms given below are independent of the type and number of basic 
symbolic representations used. 

Throughout this section, CompSym objects A and B are assumed to be in the 
following forms: 



riA t riB t 

A = \y A B -\f f\ bij 

i=l j=l i=l j=l 

and t, and Tq^ denote the number of compAtom objects in CompSym 

object A{B), the number of basic symbolic representations in the composite 
library, and time complexity of symbolic representation for operation Op. 

Subset Relation Checking: Given two compAtom objects a and 6 , a.isSubset (6) 
is evaluated by checking the corresponding symbolic representations in a and b 
for subset relation (step 2 of the algorithm given below). Checking subset relation 
for CompSym objects is more complicated. Given two CompSym objects, A and B, 
A.isSubset(^) is evaluated as shown in the algorithm below. First, both A and 
B are simplified, which has a time complexity of 0{{n^ + n^) x Yll=ii'^equai T 
Tls Subset)) - Then for each compAtom object a in A we check if a is a subset of B. 
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An efficient way to check if a compAtom object a is subset of CompSym object B 
is to compare a with each compAtom object b in B till a.isSubset(6) evaluates 
to true. This is done in steps 7-11 below. However, if no such b can be found, 
this does not mean that a is not a subset of B. Next, we create a new CompSym 
object C7, which consists of a single compAtom object a. We take the intersection 
of C and not B to obtain the CompSym object D. Then D is checked for satisfia- 
bility. If D is satisfiable then it means a is not a subset of B (steps 13-19). Time 
complexity of checking subset relation between two CompSym objects, A and R, 
is 0{nA XUB X X 

boolean compAtom: : isSubset (compAtom other) 

1 for i=l to numBasicTypes do 

2 if not atomfi] . isSubset (other .getAtom(basicTypes [i] ) ) then 

3 return false ; 

4 return true; 

boolean CompSym: : isSubset (Symbolic other) 

1 compAtom thisatom^otheratom; boolean found; 

2 LinkedList<compAtom> otherlist = other . getCompAtomList () ; 

3 this . simplify 0 ; 

4 other. simplifyO ; 

5 for compRep.hasMoreO do 

6 thisatom = compRep .getNext () ; 
found = false ; 

7 for otherlist .hasMore 0 do 

8 otheratom = otherlist . getMext () ; 

9 if thisatom. isSubset (otheratom) then 

10 found = true; 

11 break; 

12 if not found then 

13 CompSym newsyml = new CompSym (thisatom, isSet) ; 

14 CompSym newsym2 = new CompSym(otherlist , isSet) ; 

15 newsym2 . complement 0 ; 

16 newsyml . intersect (newsym2) ; 

17 if newsyml . isSatisfiableO then 

18 return false; 

19 else break; 

20 return true; 

Equivalence Checking: Checking equivalence of two compAtom objects is per- 
formed by calling isEqualO method of each symbolic representation similar 
to subset checking. Equivalence of two CompSym objects is checked by calling 
isSubset ( ) method of CompSym class and time complexity of isEqual ( ) method 
is the same as CompSym: : isSubset () method. 

Satisfiability Checking: Checking satisfiability of a compAtom object is per- 
formed by calling isSatisfiableO method of each symbolic representation. 
The condition for satisfiability of a compAtom object, a, is that each symbolic 
representation in a must be satisfiable. Satisfiability of a CompSym object A 
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is equivalent to existence of a compAtom object a in M such that a is satish- 
able. Time complexity of checking satisfiability of CompSym object A is x 

rni \ 

* = 1 is Satis fi able) ‘ 

Backward Image Computation: Backward image computation takes the tran- 
sition relation as the input parameter. Backward image of a compAtom object is 
computed by calling backwardlmage( ) method of each symbolic representation 
and passing the corresponding symbolic representation of the input compAtom 
object, as the parameter. While computing backward image for a CompSym ob- 
ject M, a new list of compAtoms is created and for each compAtom object in A as 
many copies as the number of compAtom objects in the input CompSym object are 
created. On each copy backwardlmage( ) method is called with a compAtom ob- 
ject in the input CompSym object and the resulting compAtom object is inserted 
in the new list. At the end the compAtom list of A is replaced with this new 
list. Time complexity of computing backward image of CompSym object A over 
CompSym object B is x ub x ELi 

compAtom: ibackwardimage (compAtom other) 

1 for i=l to numBasicTypes do 

2 atom[i] .backwardimage (other [i] ) ; 

CompSym: :back¥ardlmage (Symbolic other) 

1 compAtom thisatom^newatom; 

2 LinkedList<compAtom> newlist 0 ; 

3 LinkedList<compAtom> otherlist = other . getCompAtomList () ; 

4 for compRep.hasMoreO do 

5 thisatom = compRep . getNext () ; 

6 for otherlist .hasMore 0 do 

7 newatom = thisatom; 

8 newatom . backwardlmage (otherlist . getNext ( ) ) ; 

9 newlist . insert (newatom) ; 

10 compRep = newlist; 

Intersection : Given two composite formula, A and 5, we define A intersection 
B as 



riA riB t 

AAB^y y /\{aij Abkj) ( 1 ) 

i=l k=l j=l 

Intersection of two compAtom objects is computed by calling intersect () 
method of each symbolic representation and passing the corresponding symbolic 
representation in the input compAtom object. To compute intersection of two 
CompSym objects, A and B, a new list of compAtoms is created and for each 
compAtom a in A and for each compAtom object b in B, intersection of a and b is 
computed and the resulting compAtom object is inserted into the new list. At the 
end compAtom list of A is replaced with the new list. The number of disjuncts in 
the resulting CompSym object after intersection of two CompSym objects, A and 
5, is 0(nA X tib)- 

Complement : Given a composite formula A we define A’s complement as 
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Ua 

-lA — \J ^ ^aik (2) 

l<k<ti=l 

Complement of a compAtom object is computed by creating a new Compsym 
object for negation of each symbolic representation in the compAtom object. Then 
the union of each newly created CompSym object is the result of the complement 
as seen in the algorithm below. To compute the complement of a CompSym object 
A, a new CompSym object B, which is initialized to True, is created. For each 
compAtom object a in A, complement of a is intersected with B (steps 4-6). The 
number of disjuncts in the resulting CompSym object after complementation of 
CompSym object A is 

CompSym compAtom: : complement () 

1 CompSym result, temp; 

2 Symbolic sym; 

3 result = null; 

4 for i=l to numBasicTypes do 

5 if result != null then 

6 sym = atomfi] ; 

7 sym . complement ( ) ; 

8 result .union (new CompSym(sym, isSet ) ) ; 

9 else 

10 result = new CompSym(atom[i] , isSet ) ; 

11 return result; 

CompSym : : complement ( ) 

1 CompSym result (true , isSet) ; 

2 compAtom thisatom; 

3 for compRep.hasMoreO do 

4 thisatom = compRep .getNext () ; 

5 thisatom. complement 0 ; 

6 result . intersect (thisatom) ; 

7 this = result ; 



Union : Given two composite formula, A and B, we define A union B as 

riA+riB t 

AWB= V /\cij (3) 

i = l j = l 

where for 1 < i < C{j = and for -h 1 < i < nA ns Cij — bij. 
Union of two CompSym objects, A and B, is computed by inserting the compAtom 
objects in B to the list of compAtom objects in A. The number of disjuncts in 
the union of two CompSym objects, A and B, is 0(nA ns)- 
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4 A Polymorphic Verifier 

Module TransSys in Fig. 2 is responsible for verification. It contains two main 
fnnctions check and verify . Check is a recnrsive fnnction that traverses the 
syntax tree of CTL formnla to compnte a symbolic representation for its trnth 
set. 

TransSys contains following members : transRelation (transition rela- 
tion), stateSpace (defined by the domains of the variables in the inpnt specifi- 
cation) and initialState (defined by the initial condition of the inpnt speci- 
fication). These define the transition system for the inpnt specification. 

The verify fnnction determines whether the given CTL formnla is satisfied 
by the inpnt specification by calling the check fnnction. It prints the initial 
states that violate the formnla if the CTL formula is not satisfied by the input 
specification. 

Function check is the main part of the module. All computation is done 
within this function. There are two types of operations : evaluation of logical 
operators (and, or, not), and evaluation of CTL operators (EX, AX, EF, AF). 
It assumes that all occurrences of atomic formulas (subformulas with no CTL 
operators in them) are already converted into Symbolic representation. Note 
that CTL operators that can be expressed in terms of these primitives are first 
converted into an equivalent representation (e.g. AG{f) = ^EF{^f) ). 

Symbolic TransSys :: check (Node n) { 
if (n. of Type 0 == CTLFORMULA) 
switch n. getOperator 0 

case AND: s = check (n. left) . intersectWith(check (n.right) ) ; break; 
case OR: s = check (n. left) .unionWith (check (n.right )) ; break; 
case NOT: s = check (n. left) . complement () ; break; 
case NONE: s = check (n. left) ; 

else if (n. of Type 0 == CTLOPERATOR) 
s = check (n. left) ; 
switch n. getOperator 0 
case EX: 

s .backwardimage (transRelation) ; 
break ; 
case AX: 

s . complement () ; 

s .backwardimage (transRelation) ; 
s . complement () ; 
break ; 
case EF : 
do 

snew = s ; 
sold = s; 

snew . backwardimage (transRelat ion) ; 
s .unionWith (snew) ; 
while not sold. isEquals(s) 
break ; 
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Initial: count=produced=consumed=0 and size >= 1 





(counKsize) & produced’ =produced+l (count>0) & consumed’ =consumed+l 

& count’ =count+ 1 & count’ =count- 1 

Fig. 4. A simple bounded-buffer producer-consumer example 



case AF : 
do 

snew = s ; 
sold = s; 

snew . complement ( ) ; 
snew . backwardimage (transRelat ion) ; 
snew . complement ( ) ; 
s .backwardimage (transRelat ion) ; 
s . intersectWith(snew) ; 
s .unionWith(sold) ; 
while not sold. isEquals(s) 
break ; 

else if (n. of Type 0 == ATOMIC) 
s = n; 
return s ; 

} 



An important feature of function check is polymorphism. It is independent 
of underlying Symbolic type. Since each subclass of Symbolic implements basic 
functions (e.g. intersectWith, backwardimage , etc. ) used, veriher does not 
need to know which type of representation it is working on. If we introduce a 
new symbolic type, we do not need to modify the verification procedure. Also, 
using this feature the verifier can decide which symbolic representation to use 
at run-time. For example given an input specification with just boolean and 
enumerated variables, our verifier becomes a BDD-based model checker. Hence, 
such specifications can be checked efficiently without introducing the cost of 
manipulating composite representations. 

5 A Simple Example 

In Fig. 4 we show a simple producer-consumer system. Both producer and con- 
sumer components have N control states. Producer produces an item only when 
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Performance of Composite Model Checker 




Fig. 5. Performance of composite model checker and Omega Library model 
checker (using partitioning or mapping approach) on the bounded-buffer 
producer-consumer example 

it is in control state N and there is available space in the buffer [count < size). 
When it produces an item it increases produced and count by 1. Similarly, 
consumer consumes an item only when it is in control state N and there is 
an item in the buffer [count > 0). When it consumes an item it increases 
consumed by 1 and decreases count by 1. An invariant of this system is count < 
size A produced — consumed — count. 

Initial condition for this system can be represented with the composite for- 
mula: 

pstate = 1 A estate = 1 A count = 0 A produced = 0 A consumed = 0 A size >= 0 

where pstate and estate are variables introduced to model the control states 
of producer and consumer. Self-loop on state N for producer can be represented 
with the composite formula: 

pstate = N A pstate^ = N A count < size A produced^ = produced H- 1 A counA = 
count -h 1 

The overall transition relation is the disjunction of the formulas that cor- 
respond to each arc in Figure 4. (We are assuming that if a variable is not 
modified it preserves its value. These constraints have to be added to the com- 
posite formula before generating a CompSym object). 

We used this example to compare the performance of our composite model 
checker with OMC (Omega Library Model Checker) presented in [BGP97,BGP99] 
OMC uses polyhedral representations of arithmetic constraints as a symbolic 
representation. To represent the control states of the system given in Fig. 4 in 
such a tool, there are two options, 1) to partition the state space based on the 
control states, creating N partition classes, 2) to map the control states to an 
integer variable. Either option is not very efficient because of the high complexity 
of manipulating arithmetic constraint representations. In our composite library 
the control states in the above example are mapped to an enumerated variable 
which is encoded using BDDs. Integer variables are still encoded using the poly- 
hedral representation, however the unnecessary mapping to integers is prevented. 
Fig. 5 shows the execution time of the composite model checker and the OMC 
(using both partitioning and integer-mapping) with the increasing number of 
control states for the system given in Fig. 4. Although this is a small example. 
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it demonstrates the inefficiency of using a model checker which is solely based 
on polyhedral representations. 



6 Conclusion and Future Work 

The composite symbolic library presented in this paper can be used as a platform 
to integrate different symbolic representations. Using composite representations 
one can improve the efficiency of verification procedures by mapping each vari- 
able type in the input specification to a suitable symbolic representation. 

The Symbolic interface provided by our tool can be useful in integrating 
different symbolic libraries. Once a wrapper for a symbolic library is written 
the internal representations of that library will be hidden. This can also help in 
comparing performances of different symbolic representations by isolating them 
from the verification procedures. 

Using the composite symbolic library we were able to develop polymorphic 
verification procedures which are oblivious to the symbolic representation used. 
Hence, the decision of which symbolic representation to use can be made at 
run-time, based on the input specification. If the input specification has only 
boolean and enumerated variables, then our verifier becomes a BDD-based sym- 
bolic model checker. However, if both integer and boolean variables are present 
in the input specification, then it is able to use arithmetic constraints and BDDs 
together using the composite representation. 
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Abstract. Deductive verification of progress properties relies on finding 
ranking functions to prove termination of program cycles. We present an 
algorithm to synthesize linear ranking functions that can establish such 
termination. Fundamental to our approach is the representation of sys- 
tems of linear inequalities and sets of linear expressions as polyhedral 
cones. This representation allows us to reduce the search for linear rank- 
ing functions to the computation of polars, intersections and projections 
of polyhedral cones, problems which have well-known solutions. 



1 Introduction 

Deductive verification of reactive systems relies on finding invariants and ranking 
functions. While automatic generation of invariants has received much attention 
[GW75,KM76,BLS96,BBM97], automatic generation of ranking functions has 
only recently started to emerge [DGGOO]. 

Proofs of progress properties of systems (that is, properties that the system 
v^ill achieve a certain goal) involve shov^ing that cycles on the path to the goal 
terminate. The classical method for establishing such termination is the use of 
well-founded domains together with so-called ranking functions that assign a 
value from these domains to each program state. Progress is then shown by 
demonstrating that each step in the cycle reduces the measure assigned by the 
ranking function. As there can be no infinite descending chain of elements of a 
well-founded domain, the cycle must eventually terminate. Clearly the existence 
of such a ranking function implies termination. Conversely, it has been proven 
that if a cycle terminates, a ranking function exists. 

Recent years have seen great progress in automating deductive verification 
by improvements in decision procedures, invariant generation and automatic ab- 
straction. However, the synthesis of ranking functions remains largely a manual 
task. Some heuristics have been proposed in [DGGOO], but these are limited to 
functions that appear as expressions in the program text. 
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99-00984-001, by ARO grant DAAG55-98- 1-0471, by ARO/MURl grant DAAH04- 
96-1-0341, by ARPA/Army contract DABT63-96-C-0096, and by ARPA/AirForce 
contracts F33615-00-C-1693 and F33615-99-C-3014. 

T. Margaria and W. Yi (Eds.): TACAS 2001, LNCS 2031, pp. 67-81, 2001. 

(c) Springer- Verlag Berlin Heidelberg 2001 
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In this paper we propose an algorithm that generates ranking fnnctions for a 
program cycle that are linear in the program variables. The algorithm consists 
of three steps. First, it derives a set of linear expressions that are bonnded inside 
the cycle from some cycle invariant. Second, it derives a set of linear expressions 
that decrease discretely aronnd the cycle from the cycleA transition relation. 
The third step then compntes the intersection of these two sets. Any expression 
in the intersection serves as a ranking fnnction, and thns nonemptiness of the 
intersection proves termination of the cycle. 

The remainder of the paper is organized as follows. Section 2 presents onr 
compntational model of transition systems, it gives some backgronnd on well- 
fonnded domains, and it introdnces the rnnning example. In Sec. 3 we introdnce 
polyhedral cones and demonstrate that problems involving systems of linear 
ineqnalities can be rednced to problems over cones. Onr algorithm is presented 
in Sec. 4, and its application is illnstrated on the example program. In Sec. 5 we 
discnss a more complex application of the algorithm, and in Sec. 6 we conclude 
with some discussion and limitations of our approach. 



2 Preliminaries 

The computational model used to describe programs is that of a transition system 
[MP95] (fts), S = (M, O^T)^ where M is a finite set of variables, 0 is an initial 
condition, and T is a finite set of transitions. A state s is an interpretation of 
M. Each transition r G T is represented by a transition relation an assertion 
that expresses the relation between the values of M in some state s and the values 
of M (referred to by V^) in any state s^ to which the system can transition by 
taking r. A run of 5 is a sequence of states such that the first state satisfies 0 
and any two consecutive states satisfy pr for some r ^T. A state s is accessible 
if s appears in some run of S. The set of all accessible states is A. 

A relational domain (or just domain) {V, y) is a set V paired with a binary 
relation A on V. A domain is said to be well-founded if there are no infinite 
sequences of elements of V which decrease under A- A function / is said to 
map Mo (T>2?^2) if / maps Vi into T >2 and is monotone, that is, 

f{di) 1^2 f{d 2 ) for all ^ 1,^2 G T>i such that d\ c? 2 - Notice that / maps infinite 
decreasing sequences in (X^i, ^ 1 ) to infinite decreasing sequences in ^ 2 )- A 
ranking function for a domain is any function that maps it into some domain that 
is known to be well-founded, such as the non-negative integers with the great er- 
than relation. Notice that any domain for which a ranking function exists is 
well-founded. 

Ranking functions can also be used to establish the termination of transition 
systems. Let IZ = [J{Pt\^ ^ T} be the combined transition relation of S. The 
decreasing sequences of {A,1Z) are precisely the suffixes of runs of S. Thus S 
has an infinite run iff the domain (A^IZ) has an infinite decreasing sequence. 
Therefore the termination of S is equivalent to the well-foundedness of (47,7^), 
and a ranking function for (A,1Z) certifies that S terminates. 
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Our algorithm generates ranking functions that map (i7, IZ) into the well- 
founded domain (RatA,>A) of rationals greater than some constant value yl, 
with the discretely- greater- than relation defined by 

^ y iff X > y -\- A ^ 
where zl is a positive constant. 



Example 

Consider the program TERMINATE, presented in Fig. 1. The expression —i — j 
decreases by 1 with each iteration of the cycle and its value is bounded 

from below by —100 — ko, where ko is the value of k upon entry into the loop. 
Therefore, —i — j defines a ranking function for the cycle, and the program 
terminates. 



local b J, k : integer 

io : while i < 100 A j < k do 

e ■ ihj) ■■= ij,i + 1 ) 

£2 : k := k — 1 

is : halt 



Fig. 1. Program TERMINATE 



Notice that this system is not finite-state, so termination cannot be estab- 
lished by model checking. In addition, the expression —i — j does not appear 
anywhere in the program, so analytic heuristics of the form proposed in [DGGOO] 
are unlikely to discover it. Furthermore, the expression k, which seems the most 
promising analytic ranking function, has no obvious lower bound. In fact, it is 
bounded from below by min(G, io), but it is not clear that discovering this bound 
is any easier than finding the expression —i — j. 

In Sec. 4 we will demonstrate that the ranking function —i — j can be gen- 
erated automatically. 

3 Linear Inequalities and Polyhedral Cones 

Our method is inherently deductive. It reduces the synthesis of linear ranking 
functions to the search for linear inequalities implied by systems of inequalities 
extracted from the program. Essential to the method, then, is the approach used 
to derive such consequences. 

Consider any system of linear inequalities anXi-\-- • --\-aidXd < 0. Recall that 
the inequality aiXi~\-- • -\-adXd < 0 is a consequence of the system iff it is satisfied 
by every solution of the system. Two well-known rules for deducing consequences 
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of such systems are that any inequality can be scaled by a non-negative factor, 
and that any pair of inequalities can be added. These two inference rules can be 
combined to yield a single rule which derives any non-negative linear combination 
of inequalities. It is this sound and complete inference rule which motivates the 
treatment of linear inequalities developed here. 

A vector ic is a come combination of vectors vi,. . .,Vn iff tc = UiXiVi for 
scalars A* > 0. A cone is any set of vectors closed under conic combinations. 
Thus a cone is a (vector) space in which the linear combinations are restricted 
to non-negative factors. Every space is a cone since spaces are closed under 
negation. As the vector 0 is the conic combination of the empty set, the least 
cone is {0}, not 0. The greatest cone is the set containing all vectors (of a given 
dimension). 

It is easy to see that the intersection of two cones is again a cone. However, 
the union of two cones need not form a cone. This observation motivates the 
introduction of the following two concepts. The conic hull of a set of vectors E, 
written Con(E), is the set of conic combinations of E. The conic union of two 
cones Cl, E 2 , written Ci I+JC 2 , is the cone Con{C\ UC 2 ). Thus the conic hull of a 
set is the least cone containing it, while the conic union of two cones is the least 
cone containing both. 

A set of vectors R is called a ray (or half-line) if R = Con{r) for some vector 
r 7 ^ 0. A set of vectors L is called a line if T = Cin{l) for some vector / 7 ^ 0, 
where the linear hull Cin{V) is the set of linear combinations of E. Thus a ray 
is a unit cone, while a line is a unit space. A pair G = (L,R) of lines L and 
rays R is called a generator of the cone C iff C = Cin{L) l+i Con{R). Lines are 
not essential components of generators. Since Cin{l) = Con{l) l+iCon(— /), every 
line can be replaced by a pair of rays in opposite directions without chang- 
ing the generated cone. To simplify the theory, we assume all lines have been 
eliminated in this manner. In practice, however, maintaining an explicit repre- 
sentation of lines improves both the space and time complexity of algorithms on 
cones [Tel82,Wil93]. 

Notice that every cone has a generator, as C certainly generates itself. How- 
ever, unlike spaces, some cones admit only infinite generators. Cones which do 
admit finite generators are said to be polyhedral. 

Returning to linear inequalities, the inequality -h • • • + < 0 can be 

represented by the vector (cii, . . . , ad) of its coefficients, and the ray determined 
by this vector is the set of consequences of the inequality. A system of inequalities 
is represented by the set of its coefficient vectors, and the cone generated by these 
rays yields precisely the consequences of the system - a fact which we prove 
presently. Should the system also contain equalities, they can be represented 
either implicitly, as pairs of rays in opposite directions, or explicitly, as lines. 

The polar C* of a cone C is the set of vectors forming non-acute angles with 
every member of C, i.e., the set { 1 / | 1 / • 1 ; < 0 for all 1 ; G C}, as illustrated in 
Fig. 2. The polar of a cone is itself a cone. In fact, the polar of any set of vectors 
is a cone, but our interest here lies in polars of cones. Polars of cones have the 
following properties 
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- (C*)* D C\ 

- {Ci\sC 2Y = c^r\C^ 

For arbitrary cones, the two inclusions cannot be strengthened to equalities. 
However, for polyhedral cones we have 

- {c^y = c, 

- {Cir\C2y = 

These equalities are implied by a fundamental result, due to Weyl and Minkowski, 
that a cone is polyhedral iff its polar is polyhedral. Another fundamental theorem 
concerning polars of polyhedral cones is the following. 

Theorem 1 (Alternative) Let G — {ri, . . . , be a set of vectors and r be a 
vector. Either i) r ^ Con{G) or n) v ' r > 0 for some G G*, but not both. 




Fig. 2. A cone and its polar. 



Our primary interest in polar cones is driven by the relationship they bear 
to solutions of systems of inequalities. A vector [xi, . . . , x^) is a solution of the 
inequality a\Xi + • • • + exdXd < 0 iff (xi, . . . , Xd) • (oi, . . . , c^d) < 0. For a system 
of inequalities, the set of all solutions is precisely the polar of its set of coefficient 
vectors. With this observation, we are in a position to justify the soundness and 
completeness of the inference rule presented above. 

Theorem 2 (Farkas’ Lemma) Let anxi + • • • + (^idXd < 0 6e a system of 
inequalities. Let G = {n}, where ri = ( 0 * 1 , . . Then aiXi c^d^d < 0 

IS a consequence of the system iff (oi, . . . , ad) G Con{G). 

This result is easily proved using the theorem of the alternative. In the case 
of soundness, assume (oi, . . . , o^^) G Con(G). Then for all (xi,...,Xd) G G*, 
{xi, . . . , Xd) • (oi, . . . , ad) < 0. That is, every solution of the system satisfies 
the inequality. For completeness, assume (oi, . . . , ad) ^ Con{G). Then for some 




72 



Michael A. Colon and Henny B. Sipma 



(a?i, . . . , Xd) G G* , {xi, . . . , Xd) • (ai, . . . , a^) > 0. That is, some solution of the 
system fails to satisfy the inequality. 

Thus far, we have demonstrated that the polar of a system of linear inequal- 
ities represents its solutions. Another perspective on these polars is that they 
too represent systems of inequalities. In this case, the inequalities are constraints 
not on solutions, but on the coefficients of consequences of the original system. 
That is, any solution (cii, . . .,ad) of the polar is in fact the coefficient vector 
of an inequality implied by the original system, as = C. This perspec- 

tive on polars of systems of linear inequalities is also integral to the method for 
synthesizing ranking functions presented here. By computing the polar of a sys- 
tem, adding additional inequalities, and computing the polar of the augmented 
polar, the algorithm presented in Sec. 4 derives those consequences of systems 
which satisfy syntactic criteria sufficient to guarantee the existence of ranking 
functions. 

To compute polars, our method uses an algorithm, known as the Double De- 
scription Method, which is based on MotzkinA constructive proof of Minkowski A 
theorem [MRTT53,FP96]. This algorithm constructs the generator of the polar 
incrementally by successively intersecting the generator of the entire space with 
the polar of each ray in the generator of the cone. 

The algorithm also serves as the basis for implementing additional operators 
on cones. For two polyhedral cones Ci, C 2 such that Ci = Con{Gi) = 

- Gi\SG2 = Con{GiUG2), 

- C\r\G 2 = Con{Hi U TFs)*, and 

- Cl C C 2 iff r • s < 0 for every r E G\ and s E H 2 . 

Another useful operation on polyhedral cones is projection onto a space. It 
is performed by intersecting the cone with the space and eliminating positions 
that are zero in every ray of the resulting generator. When the space has a basis 
consisting of canonical unit vectors, we can simply eliminate those positions in 
the generator of the cone that are zero in every line of the basis of the space. 

For a more thorough discussion of the theory of polyhedral cones, the reader is 
referred to [Gal60,Sch86]. Those interested in the implementation of algorithms 
on cones should consult [Wil93]. 



Inhomogeneous Inequalities 

Having demonstrated the equivalence of systems of homogeneous inequalities 
and polyhedral cones, we turn now to inhomogeneous systems and argue that 
they too can be represented as polyhedral cones. 

Consider an inhomogeneous system anxi + • • • + (XidXd + /?*• < 0. The set of 
solutions of such a system is no longer a polyhedral cone, but rather a polyhe- 
dral convex set. However, by the addition of a single variable x, the solutions of 
an inhomogeneous system can be embedded in a polyhedral cone. To see this, 
consider that the inhomogeneous system is equivalent to the homogeneous sys- 
tem anxi + • • • + (XidXd + PiX G: O 5 along with the single inhomogeneous side 
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condition % = 1. That is, (a?i, . . . , x^) is a solution of the inhomogeneous system 
iff {xi, . . . , Xd, 1) is a solution of its homogeneous embedding. 

While the set of solutions of an inhomogeneous system is a polyhedron, not 
a cone, its set of consequences remains a cone. Taking conic combinations of 
inequalities continues to be a sound inference rule when applied to inhomoge- 
neous systems. Furthermore, with two minor modifications, it is also complete. 
First, the tautology — 1 < 0 must be added to the system. The need for this 
apparently redundant inequality can be attributed to the side condition of the 
homogeneous embedding. Although x — ^ cannot be represented explicitly in 
the embedding, it has as a consequence the homogeneous inequality — x < 0, 
which is representable, but not derivable. Therefore, this inequality must be 
added explicitly. 

The second modification concerns unsatisfiable systems. Consider the system 
consisting of the single inequality 1 < 0. The system is unsatisfiable, so every 
inequality is a consequence of it. For example, Xi < 0 is a consequence for any i. 
However, Xi < 0 is not a conic combination of 1 < 0 and — 1 < 0, where the 
second inequality is added for the reasons previously given. For unsatisfiable 
systems, the taking of conic combinations is sound, but incomplete. Care must 
be taken, then, to detect unsatisfiablility and to replace unsatisfiable systems 
with an equivalent system which generates all inequalities. 

The next theorem, which we state without proof, provides a procedure for 
detecting unsatisfiability and shows that inferring conic combinations is sound 
and complete for satisfiable systems. 

Theorem 3 Let anxi aidXd f3i < 0 be a system of inequalities. Let 

ro = (0, . . . , 0, — 1), and let r* = (oa, . . . , aid. A)* G = {n}* The system is 
unsatisfiable iff{0, . . . , 0, 1) C Con[G), When satisfiable, aiXi~\-- • --\-adXd-\-f3 < 0 
IS a consequence iff (oi , . . . , ad, fi) C Con[G), 



Strict Inequalities 

Consider now the mixed inhomogeneous system anXi-\-' • ’-\-aidXd~\- fii {<, 0, 

containing both strict and weak inequalities. The solutions of this system are 
embedded in the cone of solutions of the weak homogeneous system anXi-\-- • *-h 
+ Ax + ^ Oj where A is positive or zero depending on whether the 

ith inequality is strict or weak, along with the side conditions X = 1 e > 0. 
That is, {xi, . . . , Xd) is a solution of the original system iff {xi, . . .Xd, 1, e) is a 
solution of its embedding for some e > 0. 

The consequences of the mixed inhomogeneous system are the members of the 
cone generated by its weak homogeneous embedding, provided the embedding is 
augmented with two additional inequalities. First, it is necessary to add — X + ^ ^ 
0, which is the representation of the tautology — 1 < 0. Second, — e < 0 must 
also be added, as it is a representable, but not derivable consequence of the 
side condition e > 0. The presence of this second inequality guarantees that the 
coefficient of e in any consequence of the weak system can be driven to zero. 
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Thus, it plays the role of the well-known inference rule for mixed systems which 
allows any strict inequality to be weakened. 

The following theorem, a variant of which appears to have been first proved 
by Kuhn [Kuh56], demonstrates the soundness and completeness of this approach 
to mixed inhomogeneous systems. 

Theorem 4 Let anxi aidXd + A' ^ ^ system of inequalities. 

Let r_i = (0, . . . , -1, 1), ro = (0, . . . , 0, -1) and r* = (oa, . . . , aid, A, A); where 
Si > 0 when strict and Si = 0 when weak. Let G = {n}* The system is unsatisfi- 
able iff {0, , 0, 1) G Con[G). When satisfiable^ aixi -h • • • + c^d^d + /? {<, 0 

IS a consequence iff (oi, . . . , ad, (d, G Con{G) for some appropriate S. 

Note that if our interest lies in just the weak consequences of a mixed system, 
we can simply treat each strict inequality as if it were weak. However, there is 
no generator of only the strict consequences. In fact, that set is not a cone as it 
is not closed under scaling by zero. 

4 Generating Linear Ranking Functions 

Our objective is to show that a transition system S = (V,0,T) terminates. We 
do so by attempting to find a ranking function for each cycle in S. 

Let {T,1Z) be the domain associated with S, as described in Sec. 2, and let 
be a finite abstraction of {E^IZ) with abstraction function a \ E ^ 
E-^ for some finite set E~^ . That is, G iff there exists si,S2 G E 

such that s'l = a(si) and sf = ^(52) and (si, S2) G IZ. Thus E'^ induces a finite 
partition on E. When E'^ is the partition induced by the control variables of the 
system, {E'^,7Z'^) is a called the control flow qraph of {E,7Z). Let 7 : E-^ i-G 2^ 
be the function that maps each abstract state s'^ to the set of concrete states 
it represents, i.e., {s G T | a(5) = s'^}. To show that (E^IZ) is well-founded it 
suffices to show that for each cycle in {E'^^IZ'^), no infinite decreasing sequence 
of (E,1Z) is mapped to that cycle. This is so because any infinite decreasing 
sequence in (E^IZ) is mapped to an infinite decreasing sequence in {E~^ ,1Z'^), 
which must end in a cycle, as E~^ is finite. 

Consider an arbitrary cycle C E~^ of {E~^ ,1Z~^) and let G be any 
element of that cycle. Let IZ'^ be the composition of the transition relations 
along the cycle from back to . Any infinite decreasing sequence that ends 
in this cycle induces an infinite decreasing sequence in {{c'^f^Z'^) and hence in 
(7(0'^), 7(77.^)). Our approach to proving the well-foundedness of (E^IZ) is to 
prove that for each cycle of {E~^ ,1Z'^) there exists some element such that 
(7(0'^), 7(77.^)) is well-founded. 

In the remainder of this section we will assume that the well-foundedness of 
(E^IZ) is to be established, where E stands for 7(0'^), the set of states accessible 
at the chosen element of the cycle, and IZ stands for 7(77.'^), the transition 
relation around the cycle. To show that {E,1Z) is well-founded, our algorithm 
attempts to generate all functions / of the form 



/ : aixi -h . . . + adXd 
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that map into the well-founded domain {RatA^ >zi), for some constant A 

and positive constant A. The algorithm computes all functions definable as linear 
expressions over the rational program variables xi, . . . , Xd, which are bounded 
and discretely decreasing for It does so by computing approximations of 

the set of bounded expressions and the set of decreasing expressions and taking 
their intersection. 

The principle behind the algorithm is the representation of these sets as poly- 
hedral cones. Up to this point, we have demonstrated that polyhedral cones can 
conveniently represent systems of linear inequalities. But notice that the gener- 
ator of all inequalities implied by the system is also the generator of all linear 
expressions that are non-positive in every solution of the system. Our algorithm 
exploits this observation to derive a generator of the bounded decreasing expres- 
sions for (L7, IZ) from two systems of linear inequalities - the first characterizing 
the set U and the second characterizing IZ. 



Computing the Cycle Invariant 

The first step of the algorithm is to compute an invariant X characterizing U. 
For this step we assume the existence of an invariant generator that extracts an 
invariant for each control location from the description of the system. Further- 
more we posit that the generated invariants are systems of linear inequalities, 
and that they are sufficiently strong. These assumptions are reasonable, given 
the success of the automatic invariant generation techniques proposed in [CH77]. 

In an effort to increase the utility of the generated invariant for computing 
the set of bounded expressions in the next step of the algorithm, we automati- 
cally augment the system with auxiliary variables. For each system variable, an 
auxiliary variable is added which assumes the value of the corresponding system 
variable upon entry into the cycle and which never changes in value while the 
computation remains within the cycle. Thus, these variables can be considered 
symbolic constants within the cycle. 

To see the effect of such augmentation, consider again the program TERMI- 
NATE, shown in Fig. 1. An invariant for location £i is 

T_ : i < 100 A j — k < 0 . 

This invariant bounds i from above by the constant 100, but neither j nor k 
is bounded. However, the augmented program, shown in Fig. 3, produces the 
invariant 

X : i < 100 A j — k < 0 A k — ko < 0 , 

in which i, j, and k are all bounded (since j < ko is consequence of j — k < 0 
and k — ko < 0). 

Fig. 4 shows the generator of the consequences of T, where, as explained 
in Sec. 3, the first ray represents the tautology — 1 < 0, the second ray allows 
strict inequalities to be weakened, and the remaining rays represent the three 
conjuncts of the invariant. 
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k : integer 
iojjojko : integer 

i-i : (lo.Jo.ko) := 
io : while i < 100 A j < k do 

C : (ij) ■■= [j,i + 1) 

I 2 : k := k — 1 
Is : halt 



Fig. 3. Augmented version of TERMINATE 
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Fig. 4 . Generator of the consequences of X 



Computing the Bounded Expressions 

The second step of the algorithm is to compute the generator of bounded expres- 
sions. Recall that a function / : T i-A Rat is bounded if there exists a constant 
A such that f{s) > A for every s E U. That is, / is bounded if — / -h A < 0 
is implied by T, or equivalently, if — / -h A is in the cone generated by T, for 
some constant expression A. The generator of negations of bounded expressions 
is computed by projecting X onto the system variables. In fact, we project X 
with the ray (0, . . . , 0, 1) added, since strictness is not relevant for establishing 
boundedness. We then negate this generator, using the following result. 

Proposition 1 Let G = {n}, where ri = (a^i, . . . , Si), and — {Ri}, with 
r' = (-an, . . -aid, Si). Then (ai, . . . ,ad,S) e G iff {-ai, . ..,-ad,S) £ G' . 

Fig. 5 presents the generator of bounded expressions for program TERMINATE. 



Computing the Decreasing Expressions 

The third step of the algorithm is to compute a generator of expressions that 
decrease discretely around the cycle. Recall that a function f : U Rat is 
discretely decreasing if there exists a positive constant A such that, for every 
(s, E IZ, f{s) > f{s') -h A. Thus, the discretely decreasing expressions are 
exactly those expressions / such that f > f A is implied by the transition 




Synthesis of Linear Ranking Functions 



77 



h j, k, e 



G2 



H : ( 


0, 


0, 


0, 1) 


< rj:{- 


-1, 


0, 


0, 0) 


ri : ( 


0, 


-1, 


1, 0) 




0, 


0, 


-1, 0) 



any strictness 
— i is bounded > 

—j + A; is bounded 
—k is bounded 



Fig. 5. Generator of bounded expressions 



relation IZ, for some positive constant A. Alternatively, they are those / for 
which -f-\-f-\-A is in the cone generated by IZ, with zl > 0 implied by X, 

The generator of the decreasing expressions is computed incrementally: First 
we transform X into a generator of the positive constant expressions A, Then 
we restrict IZ to generate only those expressions of the form — / f A, with 
A a constant expression. This restricted generator is then further constrained to 
ensure that A is in fact positive. This result, when projected onto the coefficients 
of primed system variables, yields the set of decreasing expressions. 

The positive constant expressions A cannot be represented directly, as they 
do not form a cone. For example, they are not closed under scaling by zero. 
Therefore, we adopt the technique introduced in Sec. 3 for representing strict 
inequalities, and compute a generator of the non-negative constant expressions 
along with an indication of strictness. Now, zl is a non-negative constant ex- 
pression iff —A is non-positive and the coefficients of the system variables, both 
primed and unprimed, are zero in A. That is, for zl to be a constant expression, 
only the auxiliary variables can have non-zero coefficients. 

Recall that X generates the set of all non-positive expressions. So the polar 
of T is a system of constraints on the coefficients of these expressions, and every 
solution of the polar is the coefficient vector of some non-positive expression. 
Adding the equalities ai = 0, . . . , = 0 to the polar yields the subset of 

non-positive expressions in which the system variables all have zero coefficients, 
assuming d system variables. Thus, the polar of the augmented polar is precisely 
the set of non-positive constant expressions. By negating the generator of this 
set, we arrive at a generator of non-negative constant expressions. 

Applying this transformation to the invariant X of TERMINATE and eliminat- 
ing the system variables yields the generator shown in Fig. 6. The only positive 
constant expression is 1. 

Next we compute the generator of that subset of the expressions generated 
by IZ which have the form — / -h /^ + zl, for some non-negative constant ex- 
pression A. Again, this result is achieved by taking the polar of an augmented 
polar. First, the ray (0, . . . , 0, 1) is added to IZ, and the polar of the augmented 
generator is computed. The strictness of the non-negativity of expressions in IZ 
is not relevant, and adding the ray eliminates any constraints which IZ places 
on 6. Next, the equalities ai = — 0 :^+ 1 , . . . ,ad = —<y 2 d are added to the polar. 
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ri : (0, 0, 0, 1, 1) 1 is positive 

: (0, 0, 0, 1, 0) 1 is non-negative J 



Fig. 6. Generator of non-negative constant expressions. 



thereby restricting its solntions to those expressions in which the coefficient of 
each nnprimed system variable is the negation of the coefficient of the corre- 
sponding primed system variable. Finally, the system is angmented with all of 
the constraints on the coefficients of non-negative constant expressions. That is, 
we add all eqnalities and ineqnalities that resnlt from taking the polar of the 
non-negative constant expressions compnted earlier. 

The resnlting system is precisely the set of constraints satisfied by the coef- 
ficients of the expressions we seek. The vector (ai, . . . , asd, /?, is a solntion of 
this system iff the corresponding expression has the form — / f A, where A 
is a non-negative constant expression. Fnrthermore, if S > 0, then A is positive. 
Taking the polar of this system and projecting the resnlt onto the coefficients of 
the primed system variables and e yields the generator of a set of expressions all 
of whose strict members are discretely decreasing. 

Continning with the program TERMINATE, the generator of the decreasing 
expressions is shown in Fig. 7. 
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Fig. 7. Generator of decreasing expressions. 



Computing the Ranking Functions 

The final step of the algorithm intersects the bonnded expressions with the 
decreasing expressions. Any strict member of the resnlting cone is a ranking 
fnnction. 

The generator of the ranking fnnctions for TERMINATE is shown in Fig. 8. 
Thns —i — j k is a ranking fnnction. Notice that —i — j is also a ranking 
fnnction, since -h = (-1,-1, 0,1) is a strict member of the generated 
cone. 
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Fig. 8. Generator of ranking fnnctions. 



5 Application 

We applied onr algorithm to a system modeling the biological mechanism of lat- 
eral inhibition in cells, bronght to our attention by Ronojoy and Tomlin [RTOO]. 
Lateral inhibition, a mechanism extensively studied in biology [CMML96], causes 
a group of initially equivalent cells to differentiate. It is based on an intra- and 
intercellular feedback mechanism, whereby a cell developing into one type in- 
hibits its neighbors from developing into the same type. The result is a more or 
less regular pattern of cells of different types. 

In collaboration with David Dill, we abstracted the (continuous) model of 
differentiation of skin cells described in [CMML96] and [RTOO] into a discrete 
transition system. The system consists of a planar hexagonal configuration of 
cells. Cells can be black, white or gray, where black and white cells, if stable, 
lead to specialization into ciliated and unciliated cells, respectively. Cells tran- 
sition based on their own color and the colors of their six immediate neighbors. 
Therefore we define the state of a cell by the two variables color G {w,g, b} and 
ncolor G {IF, G, R}, where the value of ncolor is determined as follows: 

IF : Vi.(n* = white V n* = gray) A = white) 

G : 'ii.{ni = gray) 

B : 3i.{ni = black) 

with rii the neighbor cells. The transitions of a cell can then be described by 

Ti : u? A IF A T2 : g AW Ab' ts : g AG A {b' W w') 

T 4 : 6 A R A : g A B Aw^ 

The objective is to prove that this system, like its biological counterpart, sta- 
bilizes for an arbitrary number of cells. To do so we attempt to find a ranking 
function F for the entire plane of cells C. We assume F has the form 

F = S,^cfic) , 

where /(c) is the measure of a single cell. To show that F is a ranking function, 
it is sufficient to show that its value decreases whenever any cell c transitions. 
Let c be an arbitrary cell. We can write F as F = Gc + He with 

Gc = f{c) + F!i=if{ni{c)) and Hc = Sd€C\{c,m(c)...ne(c)}f(d)- 
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To show that F is decreased by every transition of c, it is snfficient to show that 
Gc is decreased by every transition of c, as transitions of c can affect only the 
state of c and the state of cA neighbors, so He is nnaffected. Thns it snffices to 
consider a gronp of seven cells (c and its six neighbors) and determine whether 
a fnnetion / exists snch that Gc is a ranking fnnetion. 

Capitalizing on the symmetry in the description, we take as variables the nine 
states in which a cell and its neighbors can be: AfwW ^ J^wG^ J^wB^ J^gW ^ J^gG^ 
J^gB^ J^bW ^ J^bB^ with each variable denoting the nnmber of cells among 

the seven with that confignration, also taking into consideration the colors of 
the neighbors of the neighbors. For example, if the set consists of a black center 
cell with six white neighbors, then Mbw — 1, J^wB — 6 and all other variables 
are zero^. 

Applying the algorithm to the transition system leads to the following ranking 
function: 

24A5^ + %MwW + ^•M'wG + 5 {MgW + MgG + •M'gB) 

This ranking function was found earlier by Dill [DilOO] using an ILP solver. From 
this ranking function we can conclude that with 



/(c) = { 



24 

6 

4 

5 
0 



if color — b and ncolor — B 
if color = w and ncolor = W 
if color = w and ncolor = G 
if color = g 
otherwise 



the function T is a ranking function for the entire plane of cells. 



6 Conclusions 

We have implemented our algorithm using the polyhedral cone library in the 
invariant generator of the Stanford Temporal Prover [BBC"^95]. Our experience 
thus far is that simple systems are easily handled, but systems with complex 
transition relations often exhaust the available memory before a ranking func- 
tion can be found. This is to be expected, as the library, based on the Double 
Description Method, represents each cone dually, i.e., by its generator and the 
generator of its polar. The generator of the polar, however, can be exponentially 
larger than the generator of the cone [McM70]. We are currently investigating an 
implementation of our method that avoids this explosion in space by maintaining 
parametric representations of cones, rather than computing polars explicitly. 

Assuming a space-efficient implementation is possible, the method, as pre- 
sented thus far, might still fail to find a ranking function when one exists. This 
incompleteness is due to the fact that the required bounds may not be linear 
expressions in the (auxiliary) variables. Future work includes finding a charac- 
terization of the class of systems for which our method is complete. 

^ In this configuration the values of the variables are independent of the colors of 
the neighbors of the neighbors. In general however, the variables are dependent on 
them. 
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Abstract. Tire paper presents a method for the automatic verihcation 
of a certain class of parameterized systems. These are bounded-data sys- 
tems consisting of N processes (N being the parameter), where each pro- 
cess is hnit e-state. First, we show that if we use the standard deductive 
INV rule for proving invariance properties, then all the generated verihca- 
tion conditions can be automatically resolved by hnite-state (BDD-based) 
methods with no need for interactive theorem proving. 

Next, we show how to use mo del- checking techniques over hnite (and 
small) instances of the parameterized system in order to derive candi- 
dates for invariant assertions. Combining this automatic computation of 
invariants with the previously mentioned resolution of the VCs (verih- 
cation conditions) yields a (necessarily) incomplete but fully automatic 
sound method for verifying bounded-data parameterized systems. The 
generated invariants can be transferred to the VC- validation phase with- 
out ever been examined by the user, which explains why we refer to them 
as “invisible”. 

We illustrate the method on a non-trivial example of a cache protocol, 
provided by Steve German. 



1 Introduction 

Automatic verihcation of inhnite state systems in general, and parameterized 
systems in particular, have been the focus of much research recently (see, e.g., 
[ES96,ES97,CFJ96,GS97,ID96,LS97,RKR+00].) Most of this research concen- 
trates on model checking techniques for verihcation of such systems, using sym- 
metry reduction and similar methods to make model checking more tractable. 

In this paper we present a method for the automatic verihcation of a certain 
class of parameterized systems using a deductive approach. The parameterized 
systems we study are bounded-data systems consisting of N processes (TV being 
the parameter), where each process is hnite-state and the number of its states is 
independent of TV. We hrst show that for a large and interesting set of assertions, 
called R-assertions, there is a number, TVq, such that the verihcation condition 

* This research was supported in part by the Minerva Center for Verihcation of Re- 
active Systems, a gift from Intel, a grant from the German - Israel Foundation for 
Scientihc Research and Development, and ONR grant NOOO 14-99- 1-0131. 

T. Margaria and W. Yi (Eds.): TACAS 2001, LNCS 2031, pp. 82-97, 2001. 

(c) Springer- Verlag Berlin Heidelberg 2001 
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claiming that an R-assertion Lp is preserved by any step of the system is valid for 
every TV > 1 iff it is valid for every TV < TVq. Thus, to check for validity of such 
verification conditions, it suffices to consider only parameterized systems with 
up to TVo processes. The number TVq is small. In fact, it is linear in the number of 
the local state variables of an individual process (i.e. logarithmic in the number 
of local states of a single process). 

Using the standard deductive INV rule for proving invariance properties, all 
the generated verification conditions for the systems we are considering are R- 
assertions. Thus, for these systems, verification of invariance properties using 
INV can be automatically resolved by finite-state (BDD-based) methods, with no 
need for interactive theorem proving. 

We also show how to use model-checking techniques over finite (TVq -process) 
instances of the parameterized system in order to derive candidates for invari- 
ant assertions. The combination of this automatic computation of invariants 
with the previously mentioned resolution of the verification conditions (VCs) 
yields a (necessarily) incomplete but fully automatic sound method for verifying 
bounded-data parameterized systems. The generated invariants can be trans- 
ferred to the VC- validation phase without ever been examined by the user, which 
explains why we refer to them as “invisible” . 

We illustrate the method on a non-trivial example of a cache protocol, pro- 
vided by Steve German. In this example, N client processes may request shared 
or exclusive access to a shared cache line. A Home process coordinates the cache 
access. Using our approach, we managed to automatically verify the property of 
coherence by which, if one process has an exclusive access to the cache line, then 
no other process may have any access right to the same line, even a shared one. 
We verified this property for any TV > 1 using only the instance of TV = 4. 

Related Work 

The problem of uniform verification of parameterized systems is, in general, 
undecidable [AK86]. There are two possible remedies to this situation: either 
we should look for restricted families of parameterized systems for which the 
problem becomes decidable, or devise methods which are sound but, necessar- 
ily incomplete, and hope that the system of interest will yield to one of these 
methods. 

Among the representatives of the first approach we can count the work of Ger- 
man and Sistla [SG92] which assumes a parameterized system where processes 
communicate synchronously, and shows how to verify single-index properties. 
Similarly, Emerson and Namjoshi [EN96] proved a PSPACE complete algorithm 
for verification of synchronously communicating processes. Many of these meth- 
ods fail when we move to asynchronous systems where processes communicate 
by shared variables. 

Perhaps the most advanced of this approach is the paper [EKOO] which con- 
siders a general parameterized system allowing several different classes of pro- 
cesses. However, this work provides separate algorithms for the cases that the 
guards are either all disjunctive or all conjunctive. A protocol such as the cache 
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example we consider in Section 6 which contains some disjnnctive and some 
conjnnctive gnards, cannot be handled by the methods of [EKOO]. 

The sonnd bnt incomplete methods inclnde methods based on explicit in- 
dnction ([EN95]) network invariants, which can be viewed as implicit indnction 
([KM95], [WL89], [HLR92], [LHR97]), methods that can be viewed as abstraction 
and approximation of network invariants ([BCG86], [SG89], [CGJ95], [KPOO]), 
and other methods that can be viewed as based on abstraction ([ID96]). The 
papers in [CR99a,CR99b,CR00] nse strnctnral indnction based on the notion of 
a network invariant bnt significantly enhance its range of applicability by nsing 
a generalization of the data-independence approach which provides a powerfnl 
abstraction capability, allowing it to handle network with parameterized topolo- 
gies. Most of these methods reqnire the nser to provide anxiliary constrncts, snch 
as a network invariant or an abstraction mapping. Other attempts to verify pa- 
rameterized protocols snch as BnrnA protocol [JL98] and SzymanskiA algorithm 
[GZ98,MAB+94,MP90] relied on abstraction fnnctions or lemmas provided by 
the user. The work in [LS97] deals with the verihcation of safety properties of pa- 
rameterized networks by abstracting the behavior of the system. PVS ([SOR93]) 
is used to discharge the generated VCs. 

Among the automatic incomplete approaches, we should mention the meth- 
ods relying on “regular model-checking” [KMM+97,ABJN99,JN00,PS00], where 
a class of systems which include our bounded-data systems as a special case is 
analyzed representing linear configurations of processes as a word in a regular 
language. Unfortunately, many of the systems analyzed by this method cause 
the analysis procedure to diverge and special acceleration procedures have to be 
applied which, again, requires user ingenuity and intervention. 

The works in [ES96,ES97,CEJ96,GS97] study symmetry reduction in order to 
deal with state explosion. The work in [ID96] detects symmetries by inspection 
of the system description. Perhaps the closest in spirit to our work is the work 
of McMillan on compositional model-checking (e.g. [McM98]), which combines 
automatic abstraction with finite-instantiation due to symmetry. What started 
our research was the observation that, compared to fully deductive verifica- 
tion, McMillan A method requires significantly fewer auxiliary invariants, usually 
down to 2 auxiliary lemmas. Our explanation for this phenomenon was that, by 
performing model-checking instead of the usual one-step induction, his model- 
checker computes many of the necessary auxiliary invariants automatically. This 
led us to the conjecture that we can compute the full invariant characterizing 
the reachable states automatically by considering just a few processes, and then 
abstract and generalize it automatically to any number of processes, which is 
the basis for our method. 

2 Bounded-Data Parameterized Systems 

We consider systems whose variables can be declared as follows: 

N : natural where TV > 1 

_ I . . . , Xfl : boolean 
“ I yi,-..,yb 

zi, . . . , Zc • array [T.TV] of boolean 
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Variable N is the system A parameter which, with no loss of generality, we as- 
snme to be bigger than 1. Note that we do not allow parameterized arrays 
whose elements range over [1..7V]. Snch data types will take ns beyond the scope 
of bonnded-data parameterized systems. We can easily extend the data-type re- 
strictions to allow arbitrary finite types instead of jnst booleans. Thns, we conld 
allow an to be a parameterized array of any finite type, and let a range 
over snch a type. 

We refer to the set of variables {yi, . . . , as V. In addition to the system 
variables, we also nse a set of anxiliary variables Aux = {i,j, n . . . : [1..7V]}. 
We refer to the variables in YUAux, that range over the parametric domain 
[1..7V], as Par-variables. We define a class of assertions, to which we refer as 
R-assertions, as follows: 

• Xg, Zr[h], and h — t are R-assertions, for s — 1, . . .,a, every Rar-variables 
h and t, and r — 1, . . . , c. For the extended case that Zr is an array over 
the finite domain Dr , we also allow the atomic assertion Zr [h] = d for every 
constant d ^ Dr. 

• If p and q are R-assertions, then so are -ip, p V q, and 3h : p, for every 
h G Aux, 

The other boolean operations and nniversal qnantification can be defined nsing 
the existing operators and negation. We write p(h), q{hA), to denote that the 
only anxiliary variables to which p (respectively q) may refer are h (respectively 
hA). An R-assertion p is said to be closed if it contains no free occnrrence of an 
anxiliary variable. 

A bounded-data discrete system (BDS) S = (V, 0,p) consists of 

• V - A set of system variables, as described above. A state of the system S 
provides a type-consistent interpretation of the system variables V. For a 
state s and a system variable i; G V, we denote by s[i;] the valne assigned to 
V by the state s. Let Y denote the set of states over V. 

• 0{V) - The initial condition. An R-assertion characterizing the initial states. 

• p{V,V^) - The transition relation. An R-assertion, relating the valnes V of 
the variables in state s G A to the valnes in an V-snccessor state s^ G Y. 

We reqnire that p has the special form 

p-^h: \j pi{h) A V/ : 

where h,t E Aux, and pi{h), qi{h,t) are qnantifier-free R-assertions which may 
refer to both V and Vb 

Typically, a bonnded-data parameterized system is a parallel composition 
7L II /^[l] II • • • II T’[7V]. The R-assertion pi{h) often describes the local effect of 
taking a transition ri within process P[h], while qi{h,t) describes the effect of 
this transition on all other processes. Usnally, qi{h,t) will say that the local 
variables of all processes P[t], for t A h, are preserved nnder a step of process 
P[li\. Note that a state s of a bonnded-data system shonld also interpret the 
parameter TV. We refer to s[N] as the size of the global state s. 

Since in this paper we only consider the verification of invariance properties, 
we omitted from the definition of a BDS the components that relate to fairness. 
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When we will work on the extension of these methods to liveness, we will add 
the relevant fairness components. 

To illnstrate the representation of a parameterized system as a BDS, consider 
program MUX-SEM, presented in Fig. 1. The semaphore instrnctions “request x^'’ 
and “release a?” appearing in the program stand, respectively, for 
(when X = 1 do x := 0) and x := 1 



in N : natural where TV > 1 
local X : boolean where x = 1 







'loop forever do 










~ I : Non-CriticaF 






P[h] :: 




T : request x 




h = 


= 1 




C : Critical 










_E : release x 





Fig. 1. Program MUX-SEM 

In Fig. 2, we present the BDS which corresponds to program MUX-SEM. Note 
that the BDS standardly contains the additional system array variable 7t[1..N], 
which represents the program connter in each of the processes. 



{ N : natural where TV > 1 
X : boolean where x = 1 
7T : array [1..N] of {/,T, (P, T} 



0: X A yh: [1..N] : 7r[h] = I 



P : 



3h : [1..N] 



" 7T^[h] = 7r[h] A x^ = X 

V 7r[h] = I A 7r'[h] = T A xJ = x 

V 7r[h] = T A x = 1 A 7r'[h] = C A x' = 0 

V 7r[h] = C A 7r'[h] = E A xJ = x 

, V 7r[/i] = E A E[K\ = I A xJ = 1 



A \/t ^ h : 7r'[t] = 7r[t] 



Fig. 2. The BDS corresponding to program MUX-SEM. 

A computation of the BDS S = (V,0,p) is an infinite sequence of states 
(j : sq,si,S 2 , satisfying the requirements: 

• Imtiality — sq is initial, i.e., sq |= 0. 

• Consecution — For each £ = 0, 1, ..., the state si^i is a 5-successor of si. 

That is, |= p{V, V^) where, for each G we interpret v as si[v\ 

and E as 

The definitions of R-assertions and BDS are such that the only tests applied 
to Tar- variables are equalities (and disequalities). Consequently, states, compu- 
tations, and satisfaction of R-assertions are all symmetric with respect to an 
arbitrary permutation of indices. Consider the system instance S{Nq), i.e., an 
instance of the system in which N has the value Nq. Let il : [T.TVo] ^ [l--No] 
be a permutation on the indices [T.TVo]. We say that the state s is a il-variant 
of s, denoted s = s[II] if the following holds: 
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• Xr — Xr^ for every r G 

• gy = n~^[yr)^ for every r G [1..6]. 

• 'zr[h] — Zr[n{h)\^ for every r G h G [l..#o]- 

where, we write v to denote the value of i; G V in s, while writing simply v 
denotes the value of this variable in state s. 

For example, applying the permutation 

il : 1 ^ 2, 2^3, 3^1 



to the state 
yields the state 



s : (z[l] : 10, z[2] : 20, z[3] : 30; yi : 1; 2/2 : 2) 



s:(z[l]:20, z[2] : 30, z[3] : 10; 2/1 : 3; 2 / 2 :!) 

Given an infinite state sequence, cr : , . . . and a permutation il, we define the 

il-variant of cr, denoted cr[i7] to be the state sequence cr[i7] = sq[II], Si[i7], . . .. 
The following claim makes the statement of symmetry precise. 



Claim (Statement of Symmetry), Let S = (G, 0,p) be a BDS, and iJ be a 
permutation with finite domain. Then 



• For a closed R-assertion p and a state 5, 5 |= p iff ^[iJ] |= p. This leads to 
the following consequences: 

• State s 1= 0 is iff s[II] |= 0. 

• State S 2 is a p-successor of si iff S 2 [II] is a p-successor of si[U]. 

• (j : So, si, . . . is a computation of S iff cr[n] is a computation of S. 



From now on, we will refer to R- assertions simply as assertions. 



3 Verification Methods 

In this section we will briefly survey the two main approaches to verification: 
Enumeration and Deduction. Both establish a property of the type A |= □ p for 
an assertion p. 

3.1 The Method of Enumeration: Model Checking 

For an assertion p = p{V) and transition relation p — p(V, V^), we define the 
p-postcondition of p, denoted by pop, by the formula 

p o p — unprime{3V : p(V) A p(V, V^)) 

The operation unprime is the syntactic replacement of each primed occurrence 
by its unprimed version v. 

We can also define the iterated computation of postconditions: 
pop*=p V pop V {po p) o p V {{po p) o p) o p V ••*, 
which, for finite-state systems, is guaranteed to terminate. Using this concise 
notation, verification by model checking can be summarized by the following 
claim: 

Claim (Model Checking), Let S = (U, 0,p) be a finite-state system and p an 
assertion. Then, S' |= □ p iff the implication 

Sop* p 

is valid. 
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3.2 Deductive Verification: The Invariance Rule 

Assume that we wish to prove that assertion p is an invariant of system S. The 
method of deductive verification suggests that the user comes up with an aux- 
iliary assertion (p, intended to be an over- approximation of the set of reachable 
states, and then show that p implies p. This can be summarized by rule INV, 
presented in Fig. 3. 

11 . 0 ^ p 

12. p A p -A p' 

13 . (/? -A p 
Up 

Fig. 3. The invariance Rule INV. 

An assertion p satisfying premises II and 12 is called inductive. An inductive 
assertion is always an over- approximation of the set of reachable states. Premise 
13 ensures that assertion is a strengthening (under-approximation) of the prop- 
erty p. In rare cases, the original assertion p is already inductive. In all other 
cases, the deductive verifier has to perform the following tasks: 

Tl. Divine (invent) the auxiliary assertion p. 

T2. Establish the validity of premises 11-13. 

For the case that the system S is finite-state all the assertions can be represented 
by BDD A. Validity of these premises can then be checked by computing the BDD 
of their negations, and checking that it equals 0 (false). For the case that S is not 
a finite-state system, for example, if it is a BDS, one traditionally uses interactive 
theorem provers such as PVS [SOR93] and STeP [MAB+94]. 

Performing interactive first-order verification of implications such as the 
premises of rule INV for any non-trivial system is never an easy task. Neither 
is it a one-time task, since the process of developing the auxiliary invariants 
requires iterative verification trials, where failed efforts lead to correction of the 
previous candidate assertion into a new candidate. Therefore, our first efforts 
were directed towards the development of methods which will enable establish- 
ing the validity of the premises of Rule INV for bounded-data parameterized 
systems in a fully automated manner. 



4 Deciding the Verification Conditions 

In this section, we outline a decision procedure for establishing the validity of the 
verification conditions generated by rule INV for bounded-data parameterized 
systems. Consider first the case that the auxiliary assertion p has the form 
p — Mi \ '0(i), where ip{i) is a quantifier-free (R-) assertion. The most complex 
verification condition is premise 12 which can be written as: 

(Vi V(i)) A (3/i : M Pdh) A \/t : Mi : tp' {i) (1) 
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The following claim states that, for a bounded-data parameterized system S{N)^ 
condition ( 1 ) can be decided by establishing it over finitely (and not too) many 
instances of S{N). 

Claim. Let S{N) be a bounded-data parameterized system. Then, the implica- 
tion (1) is valid over S{N) for all TV > 1 iff it is valid over S{N) for all TV, 
1 < TV < 26 -h 2, where b is the size of Y. 

For example, the claim states that it is sufficient to check the premises of rule 
INV over mux-sem( 2 ) in order to establish their validity over all instances of 
mux-sem(TV). 

Proof: (Sketch) Let TVq = 2 -h 26. To prove the claim, it is sufficient to show 
that the negation of condition ( 1 ), given by 

(Vi V(i)) A (3/i : \J pi{h) A \/t : A : -.^>'(0 (2) 

is satisfiable for some TV > 1 iff it is satisfiable for some 1 < TV < TVq. Clearly, 
formula ( 2 ) is satisfiable iff the formula 

(Vi : V’(i)) A Y {pi{h) A A -■V(«) (3) 

is satisfiable. It suffices to show that if formula (3) is satisfiable over a state 
(pair) of size TV > TVq, it is also satisfiable over a state (pair) of size TVq. 

Let s be a state of size TVi > TVq which satisfies assertion (3). The states 
s assigns to the variables Vaug = {h, h ^ Vb, ^ 5 } values in the domain 

[I..TV 1 ]. Let a < TVo be the number of the different values assigned to those 
variables, and assume these values are vi < V 2 < • • • < Va- There obviously 
exists a permutation II on [I..TV 1 ] such that II~^[vk\ — k for every k = 1, . . . , a. 
Let s be the il-variant of s, applying the permutation-induced transformation 
described in Section 2 to the augmented set of state variables Vaug — C U {h, /}• 
The size of s is TVi, and, according to Claim 2, it satisfies assertion (3), which is 
a closed assertion relative to the augmented variable set C U {h, /}• 

We proceed to show how to derive a new state 5 of size a < TVq which also sat- 
isfies assertion (3). The state s' is defined by letting s[TV] = a and letting s' and J 
agree on the interpretation of the variables in h, i, yi, y[, . . . , yj^, y^j^, xi, . . . , 
x'^. For the remaining variables (the we let s' and 's agree on the interpre- 

tation of every variable Zr[k] and z^.[k] where r G [l--c] and k < a. 

It remains to show that if T satisfies the TVi-version of assertion (3) then V 
satisfies the a- version of assertion (3), where the assertions are 

Ni Ni 

(A^(-^)) V ^ f\<u{h,t)) A -'tp'ii), (4) 

i=i 1=1, ...,M t=l 

a a 

and A \f {pi{h) A f\qi{h,t)) A -■V’'(*) (5) 

j = l 1=1, ...,M t = l 

respectively. 

Since the difference between the two assertions is that the conjunctions in 
assertion (5) extend only over the [T.O'] subrange of the conjunctions in assertion 
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(4), and since s' and J agree on the interpretation of variables in this snbrange, 
we conclude that 5 satisfies assertion (5). j 

Claim 4 can be extended in several different ways. For example, we can 
trivially modify it to establish that premises II and 13 of Rule INV can also be 
checked only for systems of size not exceeding 26 + 2. Another useful modification 
applies to the case of Rar-deterministic systems. A bounded-data system is said 
to be Par- deterministic if, for every Rar- variable yr and every disjunct pi{h) 
of the transition relation, pi{h) contains a conjunct of the form — u for 
some unprimed Rar- variable u. Recall that the bound of 26 + 2 was derived in 
order to cover the possibility that h,i,yi,y[, . . . ,yb,vi nr ay all assume disjoint 
values. Under a Rar-deterministic transition relation, all the primed variables 
must assume values that are equal to the values of some unprimed variables. 
Therefore, the set of variables h,i,yi,y[, . . . , can assume at most 6 + 2 

distinct values. This leads to the following corollary: 

Corollary 1. Let S{N) be a Par- deterministic BDS, Then, the premises of rule 
INV are valid over S{N) for all N > 1 iff they are valid over S{N) for all N , 
1< TV < 6 + 2. 

The last extension considers the case that both the property p to be proven 
and the auxiliary invariant ip have the form Vh, t : ip{h,t) for some quantiher-free 
(R-) assertion ip. 

Corollary 2. Let S{N) be a bounded-data parameterized system, and let p and 
ip both have the form Vh,t : ip{h,t). Then, the premises of rule INV are valid 
over S{N) for all N > 1 iff they are valid over S{N) for all N , 1 < TV < 26 + 3. 
In the case that S{N) is Par- deterministic, it is sufficient to check the premises 
for TV < 6 + 3. 

5 Automatic Calculation of the Auxiliary Invariants 

Providing a decision procedure for the premises of rule INV greatly simplifies the 
process of deductive verification. Yet, it still leaves open the task of inventing the 
strengthening assertion (p. As illustrated in the next section, this strengthening 
assertion may become quite complex for all but the simplest systems. 

Here we propose a heuristic for an algorithmic construction of an inductive 
assertion for a given bounded-data parameterized system. Let us consider first 
the case that we are looking for an inductive assertion of the form (p = Vh : 'ip(h) . 
The construction algorithm can be described as follows: 

Algorithm 1. Compute Auxiliary Assertion of the form \/h : ip{h) 

1. Let reach be the assertion characterizing all the reachable states of system 
S{No), where TVq = 26 + 2 (or 6 + 2 if A is Rar-deterministic). Since S{No) 
is finite-state, reach can be computed by letting reach := 

2. Let ipi be the assertion obtained from reach by projecting away all the 
references to variables subscripted by indices other than 1. Technically, this 
is done by using HDD operations for computing 

= Bzi[2], . . .,zi[7Vo], • ■■,Zc[‘2], . ..,Zc[No] : reach 
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3. Let ^{h) be the assertion obtained from by abstraction, which involves 
the following transformations: 

• Replace any reference to Zr[l] by a reference to Zr[h]. 

• Replace any snb-formnla of the form = 1 by the formnla — h, and 
any snb-formnla of the form — v for i; 1 by the formnla y^ ^ h. 

Let ns illnstrate the application of this algorithm to program MUX-SEM (as pre- 
sented in Fig. 1). Since, for this program, 6 = 0, we take Nq = 2 and obtain 

reach i (^ = 1) ^[1] ^ ^[2] e n 1 

• [ V (x = 0) A (^[1] G {/, T} fG ^[2] ^ {/, T}) J 

: (x = l) ^7r[l]e{I,T} 

^(h) (^ = 1) 7r[h] e {I,T} 

Unfortnnately, when we take the proposed assertion (p :yh : [x = 1) 7r[h] E 

{/,T} we find out that it is not inductive over 5(2). This illustrates the fact 
that the above algorithm is not guaranteed to produce inductive assertions in 
all cases. 

Another version of the algorithm can be used to compute candidates for 
inductive assertions of the form ip :Mh^t : il){h,t). 

Algorithm 2. Compute Auxiliary Assertion of the form Mh f^t \ f){h,t] 

Let reaesh be the assertion characterizing all the reachable states of system 
S{No ), where TVq = 26 -h 3 (or 6 -h 3 if 5 is Rar-deterministic). 

Let ipip be the assertion obtained from reaesh by projecting away all the 
references to variables subscripted by indices other than 1 or 2. 

Let f){h,t] be the assertion obtained from ipip by abstraction, which involves 
the following transformations: 

• Replace any reference to Zr[l] by a reference to Zr[h] and any reference 
to Zr [2] by a reference to z^ [t] . 

• Replace any sub-formula of the form = 1 by the formula y^ — h, 
any sub-formula of the form y^ — 2 hy the formula y^^ — t, and any sub- 
formula of the form yr — v for v ^ {1, 2} by the formula y^ ^ h A yr ^ 

Let us apply this algorithm again to system MUX-SEM. This time, we take Aq = 3 
and compute: 

reach : (7t[1] G {C , -£/}) + (^[2] G {C , -£/}) + (^[3] G {C , -£/}) + y = 1 
7t[1] G {C, ^ (x = 0) A 7t[2] G {/, T} ] 

A 7t[2] G {C, ^ (x = 0) A 7t[1] G {/, T} ] 

7T[h] G {C, ^ (x = 0) A 7T[t] G {/, T} ] 

A 7T[t] G {C, E} ^ {x = 0) A 7T[h] G {/, T} } 

Taking ip — \jh t \ f){h,t) yields an assertion which is inductive over 5(3). By 
Corollary (2), it follows that p is inductive for all S{N). R is straightforward 
to check that p implies the property of mutual exclusion Mh t : ~^{7T[h] = 
C A 7r[t] = C) which we wished to establish for program MUX-SEM. 






1 . 

2 . 

3. 




92 



Amir Pnueli, Sitvanit Ruah, and Lenore Zuck 



5.1 The Integrated Processes 

The description of the two algorithms for computing auxiliary assertions may 
have given some of the readers the false impression that there is a manual step 
involved. For example, that after computing in Algorithm (1), we print it out 
and ask the user to perform the abstraction herself. This is certainly not the case. 
The whole process of deriving the candidate for inductive auxiliary assertion and 
utilizing it for an attempt to verify the desired property is performed in a fully 
automated manner. In fact, without an explicit request, the user never sees the 
generated candidate assertion, which is the reason we refer to this method as 
“verification by invisible invariants” . 

To explain how the entire process is performed, we observe that steps (2) and 
(3) of Algorithm (1) obtain a symbolic representation of '0(h). However, to check 
that it is inductive over A(#o), we immediately instantiate h in 0(h) to form 
A^i0(i)- In the integrated process, we perform these three steps together. 
This is done by defining an abstraction relation aj for each j G [l..#o]* The 
abstraction relation is given by 

a b c 

“i : /\{4 = A /\ ((y; = i) <H- (j/r = 1)) A /\ «[i] = Zr[\]) 

r = l r = l r = l 

This relation defines an abstract state consisting of the interpretation of a primed 
copy which only cares about the interpretation of whether equals 

or is unequal to j, and the precise values of Xr- These values correspond to 
the interpretation of these variables for j — 1 in the unprimed state. Given the 
assertion reach which characterizes all the reachable states of S{Nq)^ we can 
form the assertion ipj — reach oaj. Then we claim that 

The state s is in ||0j|| iff there exists a state s G || reach ||, such that 
Xr — Xr for every r G [l..a] 

yr = j iff yr = 1 for every r G [T.6] 

Ir[j] = ^r[l] for every r G [T.c] 

Thus, we reuse the operator o for performing abstraction+instantiation instead 

of computation of a successor, which is its customary use. 

With this notation, we can describe the full verification process as follows: 
Verification Process 3. Verify property p, using a singly indexed auxiliary 
assertion. 

1. Let reach := 0op*, computed over S{No) for an appropriately chosen Nq. 

2. Let 0j := reach o aj ^ for each j G [T.TVo]* 

3. Let (f := V’i- 

4. Check that p is inductive over S{No), 

5. Check that (f ^ p is valid. 

If tests (4) and (5) both yield positive results, then property p has been verified. 

To illustrate the application of Verification Process (3), consider the aug- 
mented version of program MUX-SEM, presented in Fig. 4. In this program, we 
added an auxiliary variable last .entered which is set to h whenever process P[h\ 
enters its critical section. Applying Verification Process (3) to this program, we 
obtained the calculated invariant 
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in N : natural where V > 1 

local X : boolean where :r = 1 

local last ^entered : [I..N] 

loop forever do 

/ : Non-Critical 

P\h\ :: I I ^ • (request x\ last.entered := h) 

C : Critical 
E : release x 



Fig. 4. Augmented Program MUX-SEM 



Lp \ Mh \ ii[K\ G {C, E} fG (x = 0 A last_entered — h) 

The candidate assertion is inductive and also implies the property of mutual 
exclusion, specifiable as 



p : \fh : -^{7r[h] = C A 7r[t] = C) 

To handle the case of an auxiliary assertion which depends on two different 
indices, we define the abstraction relations 

A“=i(4 = 



O^ht 



ALi {iVr =h) ivr = 1)) A ((j/^ =t) {yr = 2)) 

. Ar = l(4[/j] = ^r[l]) A {z'r[t] = Zr[2]) 



We then formulate the verification process for doubly indexed assertions: 
Verification Process 4. Verify property p, using a doubly indexed auxiliary 
assertion. 

1. Let reach := computed over S{Nq) for an appropriately chosen Nq. 

2. Let ^ht := reach o aht^ for each h <t E [T.TVo]- 

3. Let (f := f\ tpht- 

/i<te[i..JVo] 

4. Check that p is inductive over S{Nq). 

5. Check that p ^ p is valid. 



6 German’s Cache Case Study 

In this section we illustrate the application of the invisible-invariants verification 
method to a case study which is a simple cache algorithm provided to us by Steve 
German [GerOO]. The algorithm consists of a central controller called Home and 
N client processes P[l], . . ., P[V]. Each of the clients communicates with Home 
via the following channels: 

— channel 1 - Client P[c] uses this channel to send Home requests for either 
shared or exclusive access to the cache line. 

— channel2 -Home uses this channel to send P[c] permissions (grants) for the 
requested access rights. It also sends on this channel requests to P[c] to 
invalidate its cache status. 
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in N : natural where > 1 

type message — {empty ^ req ^shared, req -exclusive^ invalidate^ invalidate jack ^ 

grant ^shared ^ grant ^exclusive] 

type cache ^state = {invalid^ shared^ exclusive^ 

local channell, channel2, channels : array[1..7V] of message where 

V« : [1..N]. (channel l[i] = channel2[i] = channel3[i] = empty) 
local sharer -list, invalidate Jist : array[1..7V] of bool where 

V« : [1. . N].(sharerJist[i] = invalidate Jist[i] = 0) 
local exclusive -granted : bool where exclusive -granted = 0 
local curr-command : message where curr -command = empty 
local curr -client : [l..A^] where curr -client = 1 

local cache : array [1.. TV] of cache_state where V« : [l..N].(cache[i] = empty) 



Fig. 5. Variables for German A cache algorithm 



— channels - Client P[c] nses this channel to send Home acknowledgments of 
invalidation of the clients cache status. 

Fig. 5 presents the variables used in the algorithm. 

The algorithm can be presented as 



N 



Home 



P[c] 



C=1 

An SPL program for Home is presented in Figure Fig. 6, and an SPL program 
for P[c] is presented in Fig. 7. 

The main property we wish to verify for this system is that of coherence by 
which there cannot be two clients, c and d, such that P[c] holds an exclusive 
access to the cache line while P[d] holds a shared access to the same cache line 
at the same time. This can be specified by the invariance of the assertion 



Me ^ d \ -i( cache [c] = exclusive A cache[d] = shared) (6) 

Following are the results of our verification experiments applied to the cache 
algorithm: 

1. We applied Verification Process (3) to the cache program. The computed 
candidate assertion failed to be inductive. 

2. We augmented the cache program with an auxiliary variable last -granted 
which is assigned the value of curr -client in transitions mo and mi. We 
then applied Verification Process (3) to the augmented program. This time, 
the candidate assertion proved to be inductive and implied the property 
of coherence. It took 1.97 seconds to compute the candidate assertion, and 
31.43 seconds to check that it is inductive (over an instance of the program 
with TV = 4). 

3. We applied Verification Process (4) to the original cache program. It pro- 
duced an inductive assertion which implied the property of coherence. It 
took 15.42 seconds to compute the candidate assertion, and 186.82 seconds 
to check that it is inductive. 
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"loop forever do 

curr -command = reqshared A -^exclusive -granted 
A channel2[curr -client] = empty ) 

sharer -list[curr -client] := true; curr-command := empty; 
channel2[curr -elient] := grant-shared 



mo: (when 



do 



, , ( curr -Command = req -exclusive A channel2\curr -elient] = empty \ 

mr. (when ^ ^ ^ [l..N].sharer = false J 

~ sharer -list[curr -client] := true; curr-Command := empty;' 
do exclusive -granted := true; X-granted := curr -client; 

_ehannel2 [curr -client] := grant -exelusive 

or 

m 2 : (when eurr -command = empty A channell[c] ^ empty do 

curr-eommand := ehannell[c]; ehannell[c] := empty; 
invalidate -list := sharer -list; curr -client := c 



m3: (when 



( {eurr -eommand = reqshared A exelusive -granted^ 

V curr -command = req -exclusive) 

A invalidate -list[c] A ehannel2[c] = empty ; 

do [channel2[c\ \= invalidate; invalidate -list[c] := false]) 



(when curr -command ^ empty A channel3[c\ = invalidate -aek do 

[sharer -list[c\ \= false; exclusive -granted := false; channel3[c] := empty]) . 



Fig. 6. Program for Home 



We repeated these experiments over two erroneous versions of the cache program, 

also provided to us by Steve German. In both cases, Verification Process (4) 

produced inductive assertions but they failed to imply the property of coherence. 
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\ skip 
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Abstract. We present a methodology for constructing abstractions and 
rehning them by analyzing counter-examples. We also present a uniform 
verihcation method that combines abstraction, mo del- checking and de- 
ductive verihcation in a novel way. In particular, it allows and shows how 
to use the set of reachable states of the abstract system in a deductive 
proof even when the abstract model does not satisfy the specihcation 
and when it simulates the concrete system with respect to a weaker sim- 
ulation notion than Milner’s. 

1 Introduction 

Verification by abstraction (e.g. [15,16,12,25,13]) is a major techniqne for veri- 
fying infinite-state and very large systems. This techniqne consists in finding an 
abstraction relation and an abstract system that simnlates the concrete one and 
that is amenable to algorithmic verification. One then checks that the abstract 
system satisfies an abstract version of the property of interest. Well established 
preservation resnlts allo^v then to dednce for a large class of properties that the 
concrete system satisfies the concrete property, if the abstract system satisfies 
the abstract one. 

In order for this techniqne to be nsed more widely, antomatic techniqnes are 
needed for 1) finding an accnrate abstraction relation and 2) antomatically gen- 
erating an abstract property and an abstract system that simulates the concrete 
one. Several papers have discussed the automatic construction of the abstract 
system, e.g. [17,6,14] for infinite-state systems. A less studied issue is that of find- 
ing/constructing the abstract domain and the abstraction relation. The situation 
is somewhat different in the case of program analysis where one is interested in 
rather generic properties mainly concerning run-time errors. In this case, de- 
pending on the programming paradigm (imperative, functional, or logic) and 
depending on the properties to be checked, several adequate abstract domains 

* This work has been partly performed while the hrst two authors were visiting the 
Computer Science Laboratory, SRI International. Their visits were funded by NSF 
Grants No. CCR-9712383 and CCR-9509931. 

Contact Author. 

T. Margaria and W. Yi (Eds.): TACAS 2001, LNCS 2031, pp. 98-112, 2001. 

(c) Springer- Verlag Berlin Heidelberg 2001 




Incremental Verification by Abstraction 



99 



together with abstraction functions have been designed and extensively stud- 
ied [28]. In model-checking, however, as one is interested in verifying properties 
specific to a given system, one usually needs to generate for every system and 
property a new abstract domain and abstraction relation. Therefore, it is manda- 
tory to have automatic techniques assisting the user in finding the abstraction. 

In this paper, we describe an automatic abstraction technique for invariance 
properties which is based on the set of atomic formulas appearing in successive 
applications of the weakest (liberal) predicate transformer on the invariant to 
be proved. This technique allows us to derive an abstraction function which is 
then used to construct an abstract system and an abstract property. When the 
property is true in the abstract system, we can conclude that the concrete system 
satisfies the invariant. The question arises, however, how to proceed in case the 
property is not satisfied in the abstract system. There are three possible reasons 
why the abstract system may not satisfy the abstract property: 1) the abstraction 
function is not fine enough to prove the property, that is, it identihes concrete 
states that should be distinguished, 2) the abstract system contains superfluous 
transitions that can be safely removed, that is, without altering the fact that 
it is an upper approximation and 3) the concrete system does not satisfy the 
specification ^ . The main contributions of this paper are on one hand algorithms 
for analyzing counter-examples that allow either to construct concrete counter- 
examples when this is possible or to refine the abstraction function. On the 
other hand, we present a uniform verification method that combines abstraction, 
model-checking and deductive verification in a novel way. In particular, it allows 
and shows how to use the set of reachable states of the abstract system in a 
deductive proof even when it simulates the concrete system in a weaker sense 
than MilnerV notion of simulation. 

For analyzing counter-examples, we present an algorithm that allows in many 
cases to analyze an infinite number of counter-examples at once. That is, the 
algorithm can deal with counter-examples that contain unfoldings of loops and 
where each time we unfold the loop we obtain a new counter-example. 

Using counter-examples to refine abstract systems has been investigated by 
a number of other researchers, e.g. [23,1,11]. Closest to our work is Clarke et aFs 
techniques [11]. The main differences are, however, that we focus on inhnite-state 
systems and that our algorithms for analyzing counter-examples work backwards 
while their algorithms are forward. This difference can lead to completely dif- 
ferent abstractions. Moreover, our technique allows in many cases to do in one 
step a refinement that cannot be done in finitely many ones using their method. 
The key issue here is that our technique incorporates accelerating the analysis 
of counter-examples that involve the unfolding of loops. On the other hand, we 
do not consider liveness properties. Also close to our work is Namjoshi and Kur- 

^ In the case of hnit e-state systems only reason 1) and 3) are relevant as a least non- 
deterministic abstract system exists and can always be computed, if we consider 
abstraction functions which is the case here. Computing this abstract system is, in 
general, not possible for inhnite-state system as incomplete decision procedures have 
to be used for constructing the abstract system. 
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shanks work [29] on computing finite bisimulations/simulations of infinite-state 
systems. The main idea there is to start from a finite set of atomic formulae 
and to successively split the abstract state space induced by these formulae un- 
til stabilization. However, in contrast to [8,24,21], the splitting in [29] is done 
on atomic formulae instead of equivalence classes which correspond to boolean 
combinations of these. A similar idea is applied in [30]. 

2 Preliminaries 

2.1 Invariants 

Given a set A of typed variables, a state over A is a type-consistent mapping 
that associates with each variable x G A a value. 

A transition system is given by a triple (A, /, R), where A is a set of states 
over a set A of variables, I C A is a set of initial states, and C A^ is the transi- 
tion relation. A syntactic transition system is given by a triple (A, 0{X), p(A, A^)), 
where A is a set of typed variables, 0{X) is a predicate describing the set of initial 
states and p(A, A^) is a predicate describing the transition relation. We associate 
in the usual way a transition system with every syntactic transition system. 

A computation of a transition system S = (A, /, R) is a sequence sq, • * * An 
such that So G / and G R, for i < n — 1. A state s G A is called 

reachable in S', if there is a computation sq, • * * An of S with Sn = s. We denote 
by 1Z{S) the set of states reachable in S. 

A set A C A is called an invariant of S, denoted by S |= DA, if every state 
that is reachable in S is in A. Given a set A C A of states and a relation A C A^ 
the weakest liberal precondition of R with respect to A, denoted by wp{R, A) or 
wp^{P), is the set consisting of states s such that for every state s^, if (s, s^) G R 
then G A. The precondition of R with respect to A, denoted by pre^(A), is the 
pre-image of A by A. We also sometimes write pre(A)(A) instead of pre^(A). 

All the semantic notions introduced so far have their syntactic counterparts 
which we assume as known. Moreover, we will tacitly interchange syntax and 
semantics, e.g. predicates and sets of states etc., unless there is a necessity to 
make a distinction. 

2.2 Abstractions 

Abstraction techniques [15,10] can be used to compute an over- approximation 
of 1Z{S). Basically, the idea consists in abstracting the considered system S' 
to a finite system S^ such the concretization of A(S^) is a super-set of A(S). 
The use of abstractions techniques in the context of model-checking is well- 
studied [12,25]. The theory is based on the notion of simulation (also called 
L-simulation, forward-simulation,...) and on preservation results which tell us 
which properties that are satisfied by S“ are also satisfied by S. 

A drawback of this method is that the simulation notion used does not take 
into account the invariance property we want to prove. To overcome this, we 
proposed in [7], the following invariant-dependent simulation notion. 
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Definition 1 . We say that is an abstraction of S with respect to a C U x 
and P C U, denoted by S , if the following conditions are satisfied: 

1. a IS a total relation^ 

2. for every state sq,si G P and G with sq ^ P and (so,Sq) G O; if 

^ dl then there exists a state sf G such that (sq,sJ) G and 
(si, Si) e a, 

3. I C P^ and 

f. for every state s in I there exists a state s^ in such that (s, G o* E 

Now, it can be proved by induction on n that for every computation sq, • * * ? 
of S such that S{ G P, for every i — 0, • • n — 1, there exists a computation 
Sq, • • • , of such that (s*, s^) G o, for every i < n. Therefore, we can state 
the following preservation result: 

Theorem 1 . Let S and be transition systems such that S , Let P^ C 

and P^ C E. If a~\P^) C P 0 P' , and ^ then S ^ U{PnP'). □ 

3 A General Verification Rule 

In this section, we present a general rule for verifying invariance properties which 
combines the two main approachs to the verification of invariance properties of 
infinite-state systems: 

1. the deductive approach which consists in applying a rule that allows to reduce 
the verification to proving a set of Ist-order formulas, and 

2. the verification by abstraction approach which consists in abstracting the 
system in hand to a finite system which is then analyzed algorithmically 
using model-checking techniques. 

To do so, we fix throughout this section a transition system S = (E, /, R) 
and a set P C 17 of states. We then consider the problem of showing that S 
satisfies the invariant P. 

A uniform rule While Theorem 1 allows us to deduce S |= EP in case |= EP^, 
they do not tell us whether it is possible to take advantage from in case ^ 
EP^. Rule (Inv-Uni) (see Fig. 1), which can be seen as a uniform presentation 
of the deductive and the verification by abstraction approaches, addresses this 
question. Indeed, the proof rule shows how concretizations of invariants of the 
abstract system can be used to prove that the predicate P is preserved by the 
transition relation of S. In fact, these concretizations are used to weaken the 
third premise of the rule (Inv-Uni). 

Theorem 2. The proof Rule (Inv-Uni) (see Figure 1) is sound and complete. 

Proof. Let us first show soundness. Let S and be transition systems such 
that (PI) S qP SV (P2): a-VTZ{S'^)) C Q, (P3): wp{R,PnP'), 

and (P4): I C P' . 
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There exists a C Y x 

S 5 '^ 

a~^Tl{S^)) C Q 

Q c\ P c\ <z wp(R, P n P') 

/CP' 

5 1= n(PnP') 



Fig. 1. Proof rule (Inv-Uni). 



Let be a computation of S. We prove by induction on n that 

Sfi C P n Pb Now, since (PI) implies that all initial states of S satisfy P, and 
since / C P', we have sq ^ P f] Pb Moreover, from (PI) and (P2), we have 
^n-i ^ Q, and hence by induction hypothesis, C QnPHPb Therefore, 

from (P3), Sn E P n Pb 

Completeness of Rule (Inv-Uni) is easily proved along the same lines as for 
the completeness of the standard rule for proving invariants, e.g. [26]. □ 

3.1 Concretizing OBDD’s 

To be able to apply Rule (Inv-Uni) we need: 

1. a finite representation of the set of abstract reachable states and 

2. to transform this representation into a finite representation of its concreti- 
sation, that is, a~^{R'^). 

Symbolic model checkers like SMV use ordered binary decision diagrams (OB- 
DDs) to represent sets of states. To do so, all abstract variables are encoded 
by boolean variables. An OBDD is very easily transformed into a proposional 
formula in disjunctive normal form. Such representation can, however, be un- 
necessarily cumbersome. 

In this section, we describe an algorithm for converting an OBDD into a 
propositional formula over the original state variables (not necessarily boolean), 
which is often almost as compact as the original OBDD. 

Consider first a simple case when the top variable x of an OBDD b is boolean. 
Then, by the Shannon-Boole expansion law, h — x • /|:r=true + x • 6|^=faise- Equiv- 
alently, this can be written as a formula 

{x — false formula(/|^=faise)) A (x = true formula(/|^=true)), 

where formula(6) is a formula corresponding to the BDD b. Generalizing this 
to program variables with arbitrary number of possible values represented as a 
vector of boolean variables x = (xi, . . . , x^), and assuming that X{^s are the n 
top variables in /, we can recursively construct a formula 

(x = i; ^ formula(/|^=v))- 

uGtype(a;) 
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bdd2f(6: BDD, varJist: list of variables): formula = 

if b ^ H then return H{b); 

if 6 = trueJbdd then res := TRUE; 

else if 6 = falseJbdd then res := FALSE; 

else 

X := caT{varJist); 

res := TRUE; 

for every v G type(:r) do 

tmp := hdd2f{b\x=v , cdT{varJist)); 

res := res A (x = v ^ tmp); 

end; 

H -A (6, res); 

end if 
return res; 
end bdd2f 



Fig. 2. Basic algorithm converting BDD to a formula. 



The basic algorithm is shown on Figure 2. It takes a BDD b and the list of 
program state variables (not necessarily boolean), and returns a formula equiv- 
alent to b. It uses a hash table H that hashes pairs of the form (&,/), where 
/ is a formula previously constructed for a BDD b. At the very beginning the 
algorithm checks whether b is already in the table, and if it is, it simply returns 
the associated formula. If the formula has not been constructed yet, it checks for 
the trivial base cases (TRUE or FALSE). If we are not at the base case, then it 
constructs a formula recursively on the BDD structure. For every value in the 
type of the first variable^ we restrict b to that value, remove the variable from 
the list, construct the formula recursively for that restricted BDD, and add the 
result into the final formula. Finally, the result is included into the hash table 
before it is returned. 

If the internal representation of the formula being constructed is done using 
pointers, then multiple occurences of the same subformula in the hnal formula 
does not cause the formula to grow exponentially in the size of 6. In fact, its size 
is only linear. However, the formula cannot be easily printed without losing this 
structure sharing. A simple solution to that would be to print the subformulas 
collected in the hash table with names assigned to them, and then print the final 
formula that has the names instead of these subformulas. However, the formula 
will be ugly and hardly manageable both for a human and for a mechanical tool 
reading it. We designed a set of simplifications that make the formula look a lot 
more understandable and even more compact. These transformations are applied 
for each program variable before the function returns from the recursive call. 

Example 1. Let us consider the abstract system of the Bakery example (see 
e.g. [7]). If we apply the basic algorithm (see Figure 2) to the obdd that char- 

We assume that the types are always hnite. 
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acterizes the reachable states of this abstract system, we obtain the following 
formula: 

(al ^ (a2 ^ a3 A pci = /II A pc2 = /21) A 
(-ia2 ^ a3 A pci = /II A pc2 ^ /21)) A 
(-lal ^ (a2 ^ -ia3 A pci /II A pc2 = /21) A 

(— ia2 ^ (a3 ^ pci /II A pc2 = /22)A 

(^a3 ^ pci = /12 A pc2 7 ^ /21))) 

The concretization of the above formula yields the conjunction of following for- 
mulae: 



(pi = 0 A p2 = 0) ^ pci = /II A pci = /21 (1) 

(pi = 0 A p2 > 0) ^ pci = /II A pci 7 ^ /21 (2) 

(pi 7 ^ 0 A p2 = 0) ^ pci 7 ^ /II A pci = /21 (3) 

pi 7 ^ 0 A p2 / 0 A pi < p2 ^ pci 7 ^ /II A pci = /22 (4) 

pi 7 ^ 0 A p2 / 0 A pi > p2 ^ pci = /12 A pci 7 ^ /21 (5) 



In this example, the concrete invariant obtained by this approach is stronger than 
the invariant generated by the method presented in [9,5]. The invariants (4) and 
(5) cannot be immediately obtained by these methods. Indeed, these methods 
cannot easily generate invariants relating the variables of different processes. 



4 Analyzing Counter-examples and Refining Abstraction 
Relations 

A key issue in applying the verification method described by Theorem 1, respec- 
tively Rule (Inv-Uni), is finding a suitable abstraction relation a. In this section, 
we discuss a heuristic for finding an initial abstraction relation and present a 
method for refining it by analyzing abstract counter-examples, that is, counter- 
examples of the abstract system. 



4.1 Initial Abstraction Relation 

Assume that we are given a syntactic transition system S = (A, 0, p) and a 
quantifier-free formula P with free variables in A. Henceforth, we assume that p 
is given as a finite disjunction of transitions ti , • • • , where each r* is given by a 
guard Pi that is quantifier-free formula and a multiple-assignment - • • ^Xn := 

5 * * ' •} • 

We want to prove that P is an invariant of S. To do so, we choose a constant 
N ^ (jj and compute wp^^{P). Then, wp^p{P) is also a quantifier- 

free formula. Let F = {iT, • * * 5 /m} be the set of atomic formulas that appear 
in ill i^be predicate describing the initial states or in the property. 

(Notice that one can choose N sufficiently large to include the atomic formulae in 
the guards.) Then, we introduce for every formula fi an abstract variable a* and 
define the abstraction function a defined by a* = fi . In [6] , we show how given a 
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transition system S, a predicate P and an abstraction function a, we compute 
a system such that S and a predicate such that a~^[P^) C P. 

Rule (Inv-Uni) addresses the question of how to benefit from computing the set 
of reachable states of even when does not satisfy BP^ . In this, section we 
address the following questions: 

1. given a counter-example for |= BP^ does it correspond to some behavior 
in the concrete system and 

2. in case the answer to the first question is no, how can we use the given 
counter-example to refine the abstraction function. 

Identifying false negatives As in this paper, we focus on invariance proper- 
ties, counter-examples are finite computations. Let = s^rf s^ • • • rfs^ be a 
counter-example for |= The concretization of cr^ is the sequence 

a“^(5g)roa“^ (sj) • • • We call a~^{cr^) a symbolic computation of 

5, if there exists a computation * * ’^nSn of S such that Si G for 

% — 0, • • - ,n. Clearly, this definition can be generalized to arbitrary sequences 
Q^pQi ' ' • AiQn, with Qi C U. Then, we have the following: 

Lemma 1. A seguence QqTiQi • • 'T^Qu^ mth Qi C P is a symbolic computa- 
tion ijf 0 n Xo ^ 9, where Xn - Qn and Xn-i-i = Qn-i-i H pre^^_^ 

□ 

Lemma 1 suggests the procedure Con Anal given in Figure 3 for checking whether 
an abstract counter-example is a false negative or whether it corresponds to a 
behavior of the concrete system. 



Input: An abstract counter-example = s§Ti sX • • 

X:=a-^(sX; 

i := n; 

while (A A 0 * A 0) 

V '= X’ 

X := pXx)na-^{st_V; 
i := i — 1 
od 

if « = 0 and ^ Pi A 0 then return ”the following is a counter-example:” 
Take any s G ^ H A / 0 
Let So := 5, s\ := ti(so) • • • , s^t, := TjT,(sn— i) 
write So • • • Sn 
else return «, Y 
h 



Fig. 3. Counter-example Analyzer: CouAnal 
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Refining the abstraction function First, we consider a simple refinement strategy 
of the abstraction fnnction. Thns, let cr“ = s^rfs^ • • - Tfs^ be a connter-example 
for 1= that is not a symbolic compntation of S. By Lemma 1, procednre 
ConAnal retnrns some i < n and a set Y = X{ C Y snch that = 0^. Now, 

since = 0, Qi-i C wp^.{-iXi) and abstract transitions from abstractions 

of states in fo abstractions of states in Xi are snperflnons and shonld be 

omitted. To achieve this, we add for every atomic formnla / in -iX* which is 
not already in a, a corresponding new abstract variable aj with aj = f. Let 
cie denote the so-obtained new abstraction fnnction. Moreover, let be the 
abstract system with {sfi s^) G pe iff there exist concrete states si, S 2 snch that 
(s*, s^) G cie, for i — 1,2, and (^i, ^ 2 ) G p. Then, is not a compntation of S^. 

Speeding-up refinement of abstraction functions The simple illnstrative exam- 
ple given in Fignre 4 shows that in general applying finitely many times the 
procednre ConAnal is not snfhcient. In this example, we want to show that lo- 
cation I 2 is not reachable and we initially take the abstraction function dehned 
a ^ X — y. After the n-th application of ConAnal we will have the abstraction 
function dehned by ai = a? = p, • • • , a* = x -h i = p, • * * ? ^ = P- However, 

the abstraction function we need is a = a? = p, ai = a? > p. The problem here 
is clearly that the abstract counter-examples contain abstract transitions that 
correspond to the unfolding of a loop in the concrete system. In the following, 
we generalize procedure ConAnal to cope with this situation. Let us hrst explain 
the main idea. 




Fig. 4. Example for showing that speeding-up is needed 



Henceforth, we assume that the description of the concrete system makes a 
clear distinction between control and data variables. That is, we assume that 
the concrete system is given by an extended transition system as in Figure 4, 
where are the control locations and x and p are the data variables. 

Let cr" = s^rfsl-'-rfs^ be a counter-example for |= □T". Assume that 
, • • • , Ti^ is a loop in the control graph of the concrete system. In the procedure 
ConAnal we apply one time pre^ on each Xi. However, since , • • • , is a loop, 
it is more interesting to apply an arbitrary number of times pre(r*Q, • • • on 
that is, to consider \J - ■ ■ ,Ti^) on Xi^. 

We assume that « > 0 as the case of « = 0 is easily handled 



3 
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For instance, in the example of Fignre 4, applying pre^{x + +) on a? = y 
gives after qnantifier elimination the predicate x < y. Now, since pre[x,y := 
1, 0)(a? < y) is empty onr strategy consists in adding an abstract variable b snch 
that b is trne in the abstraction of a state s iff s satisfies -^{x < y) which is 
X > y; what we indeed expect. 

This idea of speeding-np connter-example analyzes leads to the procednre 
AccConAnal given in Fignre 5. There are several remarks to say abont pro- 



input: An abstract counter-example = SqTiSi • • • r^s^; 

Let Li, • • • , Lm be loops in the concrete system such that 
Lj = Ti.,- ■ ■ ,Ti. + kj and 

Tl, ■ ■ ■ ,Tn = Tl, ■ ■ ■ , Ll, Ti^^ki + l , • • • , Lm, + + 1 1 ' ' ' i i 

X:=a-'K); 
i := n; 
k := m; 

while {X ^ 0 and i > 0) do 

if i = ik then 

i := i — length (Lfe) 
else X := pre^^(X) n a~^ 
h 

i := i — 1 
od 

if « = 0 and ^ Pi A / 0 then return ’’/S' does not satisfy the property” 
else return i, Y 
h 



Fig. 5. Accelerated Counter-example Analyzer: AccConAnal 



cedure AccConAnal. The first one is that for a sequence of tran- 

sitions there are in general several but finitely many ways to partition it in 
Tl, • • • , rq_iTi , , • • • , Lm- The accuracy of the obtained abstraction func- 

tion depends on this choice. In principle, one could, however, consider all pos- 
sible choices and combine the obtained abstraction functions into a single one 
(take their conjunction). An other point is that in order to have reasonably sim- 
ple abstraction functions one needs to simplify the predicates Prei^iX) n 

a~^{Qi-i)^ in particular, when possible, one should eliminate the existential 
quantification on i. 
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5 Example 

To illustrate how we can use the procedure CouAnal, we consider the verification 
of the Bounded Retransmission protocol [27,18,20,19], BRP for short. 

The BRP accepts requirements from a producer to transmit a file of data to 
a consumer. The protocol consists of a sender at the producer side and a receiver 
at the consumer side (see Figure 6). Sender transmits data frames to the receiver 



^head 




afile 



file 



head 




Fig. 6. The Bounded Retransmission Protocol. 



via channel K and waits for acknowledgment via channel L. Since these channels 
may lose messages timeouts are used to identify a loss of messages. After sending 
a message, the sender waits for an acknowledgment. When the acknowledgment 
arrives, the sender either proceeds with the next message in the file, if there 
is one, or sends a confirmation message to the producer. However, if a timeout 
occurs before reception of an acknowledgment, the sender retransmits the same 
message. This procedure is repeated as often as specified by a parameter max. On 
its side, the receiver after acknowledging a message that is not the last one waits 
for further messages. If no new message arrives before a timeout, it concludes 
that there is a loss of contact to the sender and reports this to the consumer. 
Since the same message may be sent several times by the sender, a data frame 
includes a bit to indicate whether the same datum is resent or not. In fact, the 
BRP protocol can be seen as an extension of the alternating bit protocol [2]. 
The protocol is responsible for informing the producer whether the file has been 
transmitted correctly. On the consumer side, the protocol passes data frames 
indicating whether the datum is the first one in a file, the last one, or whether it 
is an intermediate one. Thus, a data frame contains also the information whether 
the data is the first, the last, or an intermediate. A third timeout is used in case 
a transmission has been interrupted to ensure that the sender waits enough to 
be sure that the receiver is prepared to receive new frames. 
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Correctness Criterion In the original formnlation the reqnirements on the pro- 
tocol are given by an abstract BRP-spec, and the task is to prove that BRP 
implements BRP-spec. To rednce the problem of proving that BRP implements 
BRP-spec to an invariance problem, we consider a snperposition of BRP and 
BRP-spec and prove that the snperposed protocol, BRP+, satisfies the invari- 
ance property DVa/e, where Safe is a variable that is set to false as soon as 
BRP makes a transition that is not allowed by BRP-spec. It shonld be realized 
that BRP+ contains for many variables of the protocol two different copies cor- 
responding to the variable in BRP and BRP-spec, respectively. So, for instance 
there are two variables file and afile which correspond to the file to be sent 
and two variables head and ahead which correspond to the position of the data 
being processed in file and afile, respectively. It is also worth mentioning that 
the variables head and ahead are never compared in BRP"^. Any relation be- 
tween them these variables can not be dednced from the specification of BRP+. 
The property that says the data transmitted is the same as the data received is 
important for the verification of this protocol. 



5.1 Verification of the Protocol 



The BRP protocol represents a family of parameterized protocols. The param- 
eters are the nnmber of allowed retransmissions max, the length of a file last, 
and finally, the data type Data. Let ns describe now the main steps we followed 
in the verification of BRP nsing InVeSt [7]. 

An initial abstraction fnnction is generated antomatically and nsed to com- 
pnte an abstract system of the BRP+. This initial abstraction fnnction is ob- 
tained from the predicates describing the initial states, the specification and 
/\-<N^P^p{P) with i = 1. The abstraction is the identity on variables ranging 
over finite domains . The concrete variables that range over an infinite, resp. 
parameterized domain, and their abstract versions are (partially) given in the 
following table: 



head<last = file = MANY 
(head=last) = (file=ONE) 
(head=last-hl) = (file=NONE) 



(afile[ahead]=data(msg)) = msgafile 
(afile[ahead]=data(k)) = kahle 
rn=0 = -Rn 



It tnrns ont the abstract system obtained by the initial abstraction does not 
satisfy the specification. The provided trace by the model-checker SMV has 6 
states, each state contains 39 variables (not all of them booleans). This trace 
is concretized and checked nsing ConAnal. The resnlt of this analysis is that 
this connter-example is spnrions. Moreover, the resnlt of this analyzes is that 
we have to add a boolean abstract variable hah that is trne iff head — ahead. 
Then, analyzing the abstract system obtained by this new abstraction shows 
a new connter-example. In this new connter-example, first head is incremented 
which on the abstract level assigns false to hah as initially head — ahead. Then, 
after a few steps, ahead is incremented. Thns, thongh, at the concrete level 
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head — ahead, this cannot be inferred at the abstract level. Indeed, applying 
ConAnal shows ns that we have to add a new abstract variable corresponding to 
head — ahead -\- 1. Again, the abstract system obtained by this new abstraction 
shows a new connter-example. This time the incrementation of ahead precedes 
head and we have to add a new abstract variable corresponding to head + 1 = 
ahead. And now we are done. The new abstraction is fine enongh for proving the 
property; the constrncted finite abstract system satisfies the specification. 

As a second experimentation we nsed the proof rnle (Inv-Uni) to verify the 
BRP protocol. We started with the initial abstract system. Bnt rather than going 
throngh the refinement process of the abstraction fnnction, we concretized, nsing 
the procednre described in section 3, the OBDD that characterizes the abstract 
reachable states of the first abstract system. One iteration of strengthening was 
needed to prove the desired property of BRP protocol. 



6 Conclusion 

We have presented a novel verification methodology that combines abstraction, 
model-checking and dednctive methods. To snpport this methodology, and in 
particnlar, the verification by abstraction method we developed techniqnes for re- 
fining abstraction fnnctions. These are based on the analysis of connter-examples 
and allow in many cases the simnltaneons analysis of infinitely many examples 
by applying acceleration techniqnes. These techniqnes have been implemented in 
the tool InVeSt, which is a tool for verifying invariance properties of infinite-state 
systems. InVeSt is based on PVS and connected to SMV. Since then we applied 
these techniqnes to several interesting examples and the resnlts are promising. 

In contrast to [11] we did not consider liveness properties. For infinite-state 
systems, the key issne is techniqnes for antomatically generating ranking fnnc- 
tions and fairness conditions (cf. [3,4,22]). 
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Abstract. Most of the properties established during verihcation are ei- 
ther invariants or depend crucially on invariants. The effectiveness of au- 
tomated formal verihcation is therefore sensitive to the ease with which 
invariants, even trivial ones, can be automatically deduced. While the 
strongest invariant can be dehned as the least hxed point of the strongest 
post-condition of a transition system starting with the set of initial states, 
this symbolic computation rarely converges. We present a method for 
invariant generation and strengthening that relies on the simultaneous 
construction of least and greatest hxed points, restricted widening and 
narrowing, and quantiher elimination. The ehectiveness of the method is 
demonstrated on a number of examples. 



1 Introduction 

The majority of properties established during the verihcation of programs are 
either invariants or depend crucially on invariants. Indeed, safety properties can 
be reduced to invariant properties, and to prove progress one usually needs to 
establish auxiliary invariance properties too. Consequently, the discovery and 
strengthening of invariants is a central technique in the analysis and verihcation 
of both sequential programs and reactive systems, especially for inhnite state 
systems. 

Consider, for example, a program with state variables pc and x. The program 
counter pc is interpreted over the control locations me and dec, and x is inter- 
preted over the integers. Initially, the program counter pc is set to me and x to 
0. The dynamics of the system is described in terms of the guarded commands: 

pc = inc I — y x := a? -h 2; pc := dec 
pc = dec A X > 0 I — 7^ X X — 2; pc^{mc, dec} 

Suppose we are interested in establishing the invariant pc = me ^ a? = 0. A 
naive proof attempt fails, and consequently, the invariant needs to be strength- 
ened to an inductive invariant {pc = me ^ a? = 0) A {pc = dec -A x = 2). Such 
strengthenings are typically needed in induction proofs. In general, the main 

* The research described in this paper was supported in part by NSF contract CCR- 
9712383 and DARPA/AFRL contract F33615-00-C-3043. 
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principle for proving that a predicate is an invariant of some program or sys- 
tem S, consists in finding an auxiliary predicate 'ip snch that 'ip is stronger than 
(p and 'ip is indnctive; i.e., every initial state of S satisfies 'ip, and 'ip is preserved 
nnder all transitions. This rnle is sonnd and (relatively) complete. On the other 
hand, finding a strengthening 'ip is not always obvions, and nsnally reqnires a 
microscopic examination of failed verification attempts. 

Most approaches for generating and strengthening invariants are based on 
symbolic compntation of the system at hand [10, 15,4]. The bottom-up method 
performs an abstract forward propagation to compnte the set of all reachable 
confignrations, while the top-down method starts from an invariant candidate 
(p and performs an abstract backward propagation to compnte a strengthened 
invariant 'ip. There is, however, no gnarantee for snccess in exact forward or 
backward propagation. This may be dne either to infinite or nnmanageably large 
confignration spaces or to the failnre to detect convergence of the propagation 
methods altogether. Conseqnently, approximation techniqnes snch as widening 
or narrowing [8] are needed to enforce termination of symbolic compntation. The 
basic idea is to accelerate the convergence of symbolic compntations in infinite 
abstract domains. 

The framework of abstract interpretation with widening and narrowing as 
ontlined in [8], however, is not immediately applicable to the discovery and 
strengthening of indnctive invariants, since not every over- approximation of an 
indnctive invariant is necessarily an indnctive invariant. Onr main contribn- 
tions are: first, we provide an abstract description of the process of inductive 
invariant generation and strengthening based on compnting nnder- and over- 
approximations of the reachable state set; second, this framework is instantiated 
with a novel techniqne based on combining concrete widening and narrowing 
operators. Onr techniqnes can nniformly be nsed on a wide class of examples in- 
clnding transition systems where both forward and backward propagation do not 
converge. We demonstrate the effectiveness of onr approach throngh a variety of 
examples. 

Onr algorithm is based on the symbolic compntation of a seqnence of nnder- 
and over-approximations of the reachable state set. These compntations rely 
heavily on the elimination of qnantifiers in the nnderlying theory. Qnantifier 
elimination, however, is not reqnired to retnrn eqnivalent formnlas, since onr al- 
gorithm tolerates weakened qnantifier-eliminated formnlas. Whenever the com- 
pntation of the seqnence of nnder- approximations terminates, we get an in- 
dnctive invariant. Moreover, since every element in the seqnence of decreasing 
over-approximations is an indnctive invariant, onr algorithm can be stopped 
at any time and it ontpnts the best (strongest) indnctive invariant compnted 
np to this point. In the example above, onr procednre yields the invariant 
(pc = me X = 0) A {pc = dec x = 2).^ 

The approach faces two problems. First, the compntation of the seqnence of 
nnder-approximations nsnally does not terminate. Second, the compntation of 

^ This example can also be handled by some other invariant generation techniques 
based on forward reachability or abstraction [3, 17]. 
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the sequence of over- approximations terminates with very weak invariants, in 
practice. For instance, forward reachability does not converge in case the initial 
value for x is unspecified in the example above. In order to overcome these prob- 
lems we add specialized widening and narrowing operators to our algorithm. One 
of the distinguishing features of our algorithm is the use of unreachable configu- 
rations for detecting unreachable strongly connected components and computing 
corresponding narrowing operators. In this way, our algorithm terminates with 
the invariant x > —2 in case the initial value for x is unspecified in our running 
example. 

The paper is structured as follows. In Section 2 we introduce notation and 
definitions. Section 3 presents the theoretical framework that is used in Section 4 
to obtain a procedure for generating invariants using affirmation and propagation 
rules along with widening and narrowing. Finally, we conclude in Section 5 with a 
short investigation of the relationship between invariant generation and abstract 
interpretation, and comparisons with related work. 

2 Preliminaries 

Let 47 be a first-order language containing interpreted symbols for standard 
concrete domains like booleans, integers and reals. Let K denote the (first-order) 
theory of interest over the language 47. We fix the set V = {a?i, . . . , of 
(typed) variables and denote by JT the set of first-order formulas over 47 with 
free variables contained in the set V. A transition system S is a tuple (V, 0,^), 
where Q ^ T and ^ is a first-order formula over 47 with free variables contained 
in the set V U where V — {x^^ . . . , The formula 0 is called the initial 
predicate and the formula ^ a transition predicate of the system S. We shall 
denote the sequence x\^ . . . ^Xn by a? and the sequence x^^ . . . ^x^^ by x* . 

A state (j of a transition system S = (V, 0,0) is a mapping from V to values 
from the corresponding domains. If p is a state, we denote by p^ the mapping 
obtained by renaming variables Xi to x^- in p. A formula <j){x) is interpreted as the 
set |[(/»(ic)]| of all states a such that 3ft, cr |= <j){x). We define the set Reach[0)[0) 
of states reachable from the states represented by 0 via the transition predicate 
0 as the smallest set such that (i) [0] C Reach{0){0) and (ii) the state a G 
Reach{0){0) whenever 3ft, p, |= 0{x,x') for some p G Reach {0){0). Since 

the theory 3ft is fixed, we shall not mention it explicitly when we talk about 
satisfiability and validity in 3ft. Thus, validity in 3ft is denoted by |=. 

A formula transformer T is a function mapping formulas to formulas. The 
strongest postcondition transformer, denoted by SP(^), is defined as SP(^)(</)(ic)) 
= 3y.{0{y, x) A <j>{y))- The formula SP(^)(</>(ic)) denotes the set of states reach- 
able in one step from the set of states represented by Similarly, the weakest pre- 
condition transformer, ¥P(^), is defined as ¥P(^)(</)(ic)) = \/y.{0{x,y) </*(?/))• 

A fixed point of a formula transformer T is a formula f such that |= F (</>) <f>, 

A formula transformer F is monotonia if |= F{<f)) T('0) whenever f ^ fj. A 

least fixed point of F, denoted by pfj.F{ip), is a fixed point f such that for any 
other fixed point of T, it is the case that |= ^ ip). A greatest fixed point of T, 
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denoted by is a fixed point <j) such that for any other fixed point of A, 

it is the case that |= ^ (p). Whenever the transition system (V, 0,^) is clear 

from the context, we define the transformer X by = SP(^)(</)) V0. Note that 
the transformer X is monotonic. The least fixed point of this operator, 
whenever it exists in the first-order language, represents the set Reach{^){&) of 
reachable states. 

2.1 Invariants 

A formula (p is an S-invariant if Reach{^){&) C \[<j>]\. Thus, an invariant describes 
an over-approximation of the set of reachable states. An S-inductive invariant 
is a formula <f) such that (i) <f> is an S-invariant, and (ii) <f) is inductive, i.e., 
1= SP(^)((/)) ^ <p. Condition (ii) can be equivalently stated as |= ^ ¥P(^) (</>). 

In other words, <p is an S-inductive invariant if |= X{<p) <p. Note that the 

definition does not require an equivalence, but only an implication. 

It is easy to establish that the set of reachable states Reach{0){0) of a system 
S represents the strongest (inductive) invariant. By this we mean that if ip is 
any other (inductive) invariant, then, Reach{S){0) C |['0]|. However, note that if 
<f) is an inductive invariant, and |= (</> ^ ^), then ip need not be an inductive 
invariant because ip might violate condition (ii). For purposes of this paper, we 
will only be interested in inductive invariants. Thus, we are not interested in just 
obtaining any over-approximation of the set of reachable states, but only those 
that also satisfy condition (ii). This is because the inductive property provides 
a sufficient local characterization of invariance property, which makes the task 
of proving easier. 

Given a transition system S = (V,0,^), the converse transition system 
= (V,0,^“^) is defined by X^~^{x,y) = 0{y,x). The following well-known 
theorem says that if none of the initial states is backward reachable from the 
states represented by </>, then -ep is an invariant. 

Theorem 1. Let S = (V,0,^) be a transition system and <p an arbitrary for- 
mula, If ip IS such that |= (SP(^“^)('0) \/ <p) ^ ip and the formula 0 A ip is 
unsatisfiable^ then -^ip is an S-inductive invariant. 

Corollary 1. If Reach {X^~^){(p) H [0]| = 0, then the formula corresponding to 
the complement of the set Reach is an S-inductive invariant. 

We remark here that although application of the SP(^) transformer is called 
“forward propagation” , the term “backward propagation” is typically used for 
the transformer ¥P(^). But there is no anomaly here as the transformers SP(^“^) 
and ¥P(^) are duals in the sense that S?{0~^)[<p) is logically equivalent to 
-i¥P(^) (-!(/)). Hence, Theorem 1 can be stated in terms of ¥P(^). It also follows 
that if formula <p is an invariant, then the formula nip.cp AW{0){ip) is an induc- 
tive invariant that is a strengthening of <p‘^ . Similarly, it is easy to see that there 
is a corresponding connection between the SP(^) and ¥P(^“^) transformers. 

^ It follows from this duality that the the least (greatest) hxed point iterations of 
SP(^^“^) V <p are logically equivalent to the negations of the greatest (least) hxed 
point iterations of WP(^^) A ^(p. 
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3 Inductive Invariant Generation 

In this section, we discnss the problem of antomatically generating some nsefnl 
indnctive invariants for a given transition system. It is a simple observation that 
the greatest fixed point whenever it exists, is an S-indnctive invariant. 

Lemma 1. Let S — (V,0,^) he a transition system. Recursively define the 
sequence of formulas <f)Q, fi, . . as follows, 

(f)o = true = S?{(^){(j)i) V 0 

Then^ every formula fi is an S-inductive invariant. Furthermore, every formula 
fi in the above sequence can be decomposed as ipi V Xi^ where 

ipo = false = SP(tP)(V’*) V 0 

Xo = true Xi+i = SP(^)(xO. 

The seqnence , represents iterations in a least fixed point compnta- 

tion of the X transformer. The seqnence Xo, Ai? • • • ? represents the greatest fixed 
point component. The formnlas ipi provide snccessive nnder-approximations of 
the set Reach{0){0) of reachable states. The formnlas fi are indnctive over- 
approximations. The seqnence • • •, nsnally does not terminate, whereas 

the seqnence fo, fi, . . . , often terminates with very weak invariants. 

It shonld be observed here that the greatest fixed point of the SP(^)(_) V 0 
transformer characterizes states cr snch that there exists a backward path starting 
from cr which is either infinite, or contains some initial state. In case of finite 
state transition systems, this is exactly the set of states that either belong to 
a strongly connected component, or, that are reachable from either some initial 
state or some strongly connected component. Hence, the greatest fixed point 
may not be the strongest S-indnctive invariant even in the case of finite systems. 
Despite its shortcomings, this simple method is attractive since (i) we do not 
need to detect that the iterations have converged^, and (ii) every formnla fi is 
an S-indnctive invariant. Detecting convergence is difhcnlt as it involves deciding 
if 1= 

Example 1, Consider the transition system over ten states presented in Fignre 1. 



3.1 Widening and Narrowing 

In the case when the state space is either infinite, or finite bnt too large, the 
symbolic compntation of (greatest or least) fixed points of varions transformers is 
restricted by the finite space and time resonrces available. A well-known solntion 
to this problem is the nse of widening and narrowing to respectively enhance the 

^ If 0 is an /S'- invariant, then every iteration in the greatest hxed point computation 
of WP(^^)(_) A 0 is also an /S-invariant. But, if f is inductive, then this method yields 
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States are represented by nodes with inte- 
ger labels and transitions are represented 
by edges. State 1 is the initial state. 
Clearly, the set of reachable states is the 
set 

{ 1 , 2 , 3 , 4 }. 

The greatest hxed point of the SP(^^) V 0 
is the set consisting of states 

{1,2, 3, 4, 5, 6, 7, 8, 9}. 



Fig. 1. A finite state transition system. 



least and greatest fixed point compntation (with gains obtained both in terms 
of space and time). 

A Widening operator yijTxjTi-^jrisa function such that for all formulas 
G 1= ^ Similarly, a narrowing operator \ T xT ^ T 

is a function such that for all formulas G |= A(</), ^ (</> A Thus, 

logical disjunction V is a trivial widening operator, and logical conjunction A is 
a trivial narrowing operator. 

The definitions of widening and narrowing are slightly different from the 
standard ones [8,9]. First, we do not include any conditions to guarantee that 
increasing (decreasing) sequences are transformed to finite, hence converging, 
increasing (decreasing) sequences by widening (narrowing). Secondly, in the case 
of narrowing, the standard definition requires that whenever <f)^ ^ <f), the formula 
A((/), (f)^) is such that <f)^ A(</), <f)') and A(</), <f)^) <f). In our definition, A(</), <f)') 

is stronger than both <f) and (j)^ as our interest is in the use of narrowing to obtain 
nnder-approximations of the greatest fixed point. But we have to be careful so as 
to not eliminate any reachable states by overly aggressive under- approximation, 
see Lemma 3. 

A particularly simple narrowing operator, denoted by is defined by 

A('0)((/), (/)^) = A A ijj, where ip is an arbitrary formula. Similarly, we can 
define <p') = \/ \/ ip . Since we are interested in generating inductive 

invariants, it turns out that in order to guarantee correctness, we can use any 
arbitrary widening operator, but not any narrowing operator. 

Lemma 2. [Upward iteration seguence with widening] Let • • •, a se- 

guence of formulas such that [jq is 0, and for every i > 0, either 

(i) ipi IS SP(^)(V^i_i) V ipi-i, or 

(ii) ipi IS v('^O(' 0 *- 2 , V^i-i)? where O'* is any arbitrary formula. 

Then^ if for some n > 0, |= SP(^)('0;^) ip^ then the formula ipn is an S- 
inductive invariant. 



Lemma 3. [Downward iteration seguence with narrowing] Let • • • , be a 

seguence of formulas such that <po is true, and for every i > 0, either 
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(%) (j)i IS SP(tP)((^*_i) V 0, or 

(n) (f)i IS A{j3i){(f)i-2, ^ where pi is some S-inductive invariant. 

Then, for every i, fi is an S-inductive invariant such that |= fi pi. 

Lemma 3 extends the greatest fixed point iterations in Lemma 1 by a nar- 
rowing operator. Similarly, Lemma 2 extends the least fixed point compntation 
that is hidden inside the iterations in Lemma 1 by a widening operator. 

We obtain the formnla pi nsed in Lemma 3 by identifying strongly connected 
components consisting of nnreachable states. This is achieved nsing backward 
propagation from an nnreachable state, as ontlined in Theorem 1. These nn- 
reachable states are not antomatically eliminated by the greatest fixed point 
compntation ontlined in Lemma 1. Fnrthermore, an S-indnctive invariant ob- 
tained nsing Lemma 2 can be nsed in Step (ii) of Lemma 3. Thns, Lemma 3 
gives a method for systematically strengthening known invariants. 

Example 2. Following np on Example 1, let N = {1, 2, . . ., 10} denote the set 
of all states. In order to strengthen the over- approximation, viz. N — {10}, of 
the set of reachable states obtained via the greatest fixed point compntation, we 
can try removing certain states. Bnt if we remove a snbset of states that is not 
strongly connected, the snbseqnent fixed point compntation may no longer be 
monotonic, and conld fail to converge. 

For instance, removing state 5 from the above set gives a new set 7Vi = 
N — {5,10}. Now, S?{0){<f)]yP V 0, where is the characteristic predicate 
of TVi, represents the set N 2 = N — { 6 , 10}. Clearly, N 2 ^ W, and hence the 
seqnence of formnlas obtained in the greatest fixed point compntation is no 
longer monotonic. Note that all formnlas in the seqnence are invariants, bnt 
they are not indnctive. 

In order to identify nnreachable states, we note that if we start with the 
set Ns = {7,8}, and we assign p in Theorem 1 to the characteristic predicate 
of Ns- The least fixed point of \/ represents the set N 4 = 

{5,6,7,8,9,10}. Now, since the formula 0 A Pn 4 is unsatisfiable (i.e. the set 
{l}nN 4 = 0), it follows from Theorem 1 that the set N 5 = {1, 2, 3, 4} represented 
by ^Pn 4 is an S-inductive invariant. 

4 An Any-Time Algorithm for Generating Inductive 
Invariants 

The transition predicate ^ of a transition system S = (V = {xi, . . . , 
is typically specified using a finite set of guarded transitions, where a guarded 
transition consists of a guard 7 G and a finite set of assignments {xi : = 
epx), . . . ,Xn := en{x)}- A guarded transition r is written as 

7 I — ^ xi\- ei{x)]. . .]Xn en{x) 

Note that the lemma also holds if we drop the word “inductive” from the statement. 



4 
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where is some expression with free variables in the set x. We shall also use 
the compact notation x := e{x) to represent the above assignments. 

A typical specification of a guarded transition system contains at least one 
control variable, usually the program counter pc G which takes 

values from a finite set, say {1, . . .,p}. Control states are defined by formulas 
of the form pc = i, i E {1, . . .,p}. This transition system then has p different 
control states. Additionally, we assume that the source states of each guarded 
transition belong to some fixed source control state, pc = i, (and similarly for 
the target states) so that each transition r can be written as 

pc = i A 7 I — 7^ X := e(x); pc := j 

where x denotes variables in V — {pc}. In this case, we define src(r) = i and 
tgt{r) — j. By ^j{x^x^)^ we denote the formula '^{x) /\ x^ — e{x). If T is 
a set of such transitions, then the transition predicate ^ is itself defined by 
\J pc — src{r) Apc^ = tgt[r) A0r- Similarly, we assume that |= 0 ^ pc = 1. 

Whenever such a decomposition of the state space into finitely many control 
states is available such that every transition has a unique source and target con- 
trol state, the S-invariant can be maintained as a conjunction of local invariants 
indexed by the control locations. We assume that every formula is represented 
as an array of formulas indexed by integers {1, . . . ,p}. Given an S-inductive in- 
variant ip (as an array of formulas), and a transition predicate the function 
propagation(0, p, /?) returns the strengthened S-inductive invariant X^{(p). 

function propagation(0, p, /?) { 

let 0 be pc = 1 A 0^ ; 

for k iterations do: for every i in parallel do { 

J] •- {r eT : tgt{r) = /}; 

/ Vrer. SP(^r)(^[src(r)]) V 0' if i = 1 1 
\ Aer. SP(^r)b[src(r)]) if i ^ 1 J ’ 

p[i] := K-simplif y(p[i]) ; 

} 

return(p) ; 

} 

The function 3ft- simplify performs quantifier-elimination and simplification in 
the theory 3ft and is described in Section 4.2. 

Lemma 4. Let S — (V,0,^) he a transition system and let (pQ be an array of 
formulas initialized to true. Let ipk denotes the array propagation(0, po, ^) 
of formulas (assuming 3ft-simplify always returns eguivalent formulas), and fk 
be as defined in Lemma 1. Then, for all k >0, |= fG /\^-i{pc — i ^ Tk\}\)^ 
Conseguently, the formula /\^-i{pc — i ^ Tk[i\) c,n S-inductive invariant, for 
every k. 

Notice that the formula /\^-i{pc — i ^ ^[^1) equivalent to the formula 
Vf=i(pc — i A under the assumption that — 0* computa- 

tions outlined in other lemmas and theorems can be suitably cast in terms of 
local invariants at control Ic 



Dcations. 
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4.1 Combining SP(^) and SP(^ Iterations 

The basic algorithm for the automatic generation of inductive invariants con- 
sists of affirmation and propagation steps — the essence of which is captured in 
Lemma 1 and function propagation. In order to get stronger invariants, we 
propose the use of narrowing and widening. 

The function widening(i^, starts with a given under-approximation d 

of the set of reachable states, and widens it using a subformula a of the over- 
approximation Lp. If this widening yields an S-inductive invariant (see Lemma 2) 
in k propagation steps, then the function returns this invariant, otherwise it just 
returns true^. 

function widening ( { 

A := 

choose j G {1, . . . , p} and a formula a s . t . 

(p[j] is of the form pWa, and i9[j] A a is satisfiable; 

X[j] := X[j] V a; /+ widening +/ 

X := propagation(0,^,X,^); 

if ( 1= propagation(0, X, l)[i] ^ x[i] for all i) 

return(x); /* new invariant +/ 

return(true) ; 

} 

Lemma 5. For any value of the constant k, if x denotes the array of formulas 
returned by widening(i^, p, , then the formula f\^-i pc — i ^ x[^] an S- 
inductive invariant. 

Strongly connected components of unreachable states are detected using 
backward propagation, and if successful, this information is used for strengthen- 
ing the current invariant. The subroutine narrowing(i^, p, /?) chooses a subfor- 
mula f3 of the over-approximation ip which could possibly represent unreachable 
states. Thereafter, it computes the set of states that are backward reachable from 
the conjectured unreachable states (3 and if we successfully terminate without 
intersecting 0 (see Theorem 1), then we again have an S-inductive invariant. 

function narrowing(i^, p, /?) { 

choose i G {1, . . . , p} and a formula /? s . t . 

ip[i'\ is of the form pW /?, and d[j'\ A (3 is unsatisf iable; 

X := propagation(pc = j A f alse, ; 

if ( |=propagation(pc = j A X, 1)[^1 ^ x[^] 0 

if (|=“'(x'^e)) 

return ( Invariant (-ix) ) ; 

else if (x A 0 is satisfiable) /+ /? is reachable +/ 

return (Reachable (pc — j A (3)) \ 



^ We shall overload true (false) to also denote arrays in which every element is true 
(false), and use assignments between arrays to mean element-wise copying. 
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} 



return ( Invariant (true) ) ; 



The return value Reachable of the function narrowing(i^, says that 
the states represented by 'ip are reachable, and the return value Invariant 
denotes that the formula represented by 'ip is an inductive invariant. 

Lemma 6. For any value of the constant k, if the function narrowing (i^, (f, k) 
returns Reachable then ['0] C Reach{0){&) , Similarly, for any value of 
the constant k, // narrowing (i^, returns Invariant (0) , then the formula 

Af_i(pc — i ^ '0[^D S-inductive invariant. 

Finally, we outline a procedure that uses the various functions described 
above by combining the least fixed point and greatest fixed point computations 
with narrowing and widening. In the procedure, the formula 0 always stores an 
under- approximation of the set of reachable states, and the formula 0 always 
stores an S-inductive invariant. The procedure essentially consists of doing one 
of four different steps — (i) Augmenting 0 using propagation(0, 0,ip,k) , where 
k is some constant; (ii) Strengthening the current invariant 0 using the func- 
tion propagation(0, <p,k); (iii) Use of widening on the under-approximation 
for generating an invariant; and, (iv) Use of narrowing to detect and eliminate 
unreachable states from the over- approximation. 

/+ Given: S = (V,0,^), a transition system with p control states. 

The transition predicate 0 is indexed by guarded transitions. 

k is an upper bound on the number of iterations. +/ 

Procedure InvGen: 

0, 0: Array [l...p] of formula 

Initialization : 

0 := true; 

0 := false; 

repeatedly do the following {{ 

0 := propagation(0, 0, /?) ; 

if ( 1= propagation(0, 0, l)[i] ^ 0[i] for all i) 

0 :=0; terminate the program; 

} OR { 

0 := propagation(0, 0, /?) ; 

} OR { 

(p (p /\ widening(0, 0, /?) ; 

} OR { 

if (narrowing(0, 0, /?) returns Reachable (/?) ) 

V’[i] := V’[i] V X where (3 is pc = jAx', 

else (assuming narrowing returns Invariant (/?) ) 

0[i] := 0[i] A 0[i] for all i; 

}} 
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Theorem 2. Let <j) be the array of formulas m the procedure InvGen. Then^ 
at any stage of the procedure, the formula /\^{pc — i ^ ^['^]) S-inductive 

invariant. 

Our procedure does not consider the control structure of the transition graph 
to generate invariants. Though specific control structures, like loops, are not rel- 
evant for correctness of the basic procedure, they can be important in choosing 
specific points for widening or narrowing [6]. We wish to point out that the pro- 
cedure is tolerant to theorem proving failures and only assumes a refutationally 
complete prover. In particular, note that the satisfiability test in widening can 
be eliminated. 

4.2 Quantifier Elimination and Simplification 

We remark here that implementation of propagation requires elimination of ex- 
istential quantifiers. The existential quantifier in and the universal 

quantifier in can both be easily eliminated using substitutions. The 

quantifiers in and cannot be eliminated so easily in gen- 

eral. But in special cases, for instance when the transition is “reversible” (for 
example, the effect of assignment x := x y can be reversed by the assignment 
X := X — y), quantifier elimination reduces to substitution again. In cases where 
exact quantifier elimination is not possible, we can still get a correct procedure 
using a quantifier elimination procedure that returns a “weaker” formula, i.e., 
we do not need an equivalence preserving quantifier elimination procedure. 

Let ^-simplify be a function such that |= ^ K-simpl if y (</>). We shall 

denote the formula K-simplif y(</>) by f in the next theorem. 

Theorem 3. Let ipo, ipi, . . . , ipi be an upward iteration sequence with widening 
and fo, fi, . . . , fi be a downward iteration sequence with narrowing (see Lem- 
mas 2 and 3). Then the sequence ipo, ipi, . . . , ipi, ipi is also an upward iteration 
sequence with widening. Similarly, the sequence fo, fi, . . . , fi-i, (j)[, where is 
fi-i A fi, IS also a downward iteration sequence with narrowing. 

Note that the formula <f)^- in Theorem 3 can be seen as results of “narrow- 
ing” in the sense of [9]. Theorem 3 makes it possible for simple (and possibly 
incomplete) quantifier elimination procedures to suffice for our purposes. For 
instance, when it is not possible to eliminate the existential quantifier from 
3x.p{x) A q{x), we could weaken this to 3x.p{x) A 3x.q{x) and perform quanti- 
fier elimination on atomic formulas. With suitable modifications as outlined in 
Theorem 3, our procedure continues to be correct. In fact, such simplifications 
help in the convergence of the iterations as well. 

Finally, as pointed out in Lemma 1, implementation of the above procedure 
can be optimized by combining the arrays and f into a single array, say (p. 
If individual formulas p[i] are always stored in disjunctive normal form, then 
we can distinguish the disjuncts that would appear in fj[i] by marking them. 
In this way, a single propagation step can be used to update both and <f>. 
The implementation of the above procedure is being done in the framework of 
SAL [1], which is a collection of different tools for analyzing concurrent systems. 
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4.3 Illustrative Examples 

We shall provide certain simple examples to illustrate the procedure. The theory 
of interest is the theory of linear arithmetic, and we assume that we have an exact 
quantifier elimination procedure. 

Example 3. Consider the example outlined in Section 1. In this case, the least 
fixed point sequence converges in two steps. In particular, we obtain the invariant 
pc — me X — {) /\ pc — dec x — 2. 



Example J^. A simplified version of the Bakery mutual exclusion protocol S = 
(V,0,^) for two processes pi and p2 accessing a critical section cs is given by 
V = {yl : int,y2 : int^pcl : {l,2,3},pc2 : {1,2,3}}, 0 is pci = 1 A pc2 = 
1 A yl = 0 A y2 = 0, and 0 is defined by the following set of guarded transitions: 



pci = 1 

pci = 2 A {y2 = 0 V pi < y2) 
pel = 3 
pc2 = 1 

pc2 = 2 A (pi = 0 V p2 < pi) 
pc2 = 3 




yl := y2 + l;pcl := 2; 


// Pl: try 


pci := 3; 


//pi: enter cs 


yl := D]pcl := 1; 


// pl: exit cs 


y2 := yl + l;pc2 := 2; 


// P2: try 


pc2 := 3; 


// p2: enter cs 


y2 := 0; pc2 := 1; 


// p2: exit cs 



Since this system has an infinite number of reachable states, the least fixed point 
computation sequence does not converge. We choose to define 9 control locations 
based on the values of pci and pc2 variables, and we shall use the notation </>[/, j] 
to denote the current invariant at control location pci = i A pc2 = j. After a 
few iterations, the greatest fixed point iterations yield a formula </>, with the 
following three local invariants (due to space restrictions, we are not writing 
down the complete formula here): 



^[3, 1] : s/2 = 0 

^[3, 2] : (s/2 = s/1 + 1) V (s/1 = 1 A s/2 = 0) 

^[ 3 , 3] : (s/1 = 0 A s/2 = 1) V (s/1 = 1 A s/2 = 0) 



The disjunct /?, defined as pi = 0 A p2 = 1, in control location pci = 3 Apc2 = 3 
can be conjectured to be unreachable (as the formula V^[3,3] in the least fixed 
point iterations is always false) and for a suitable choice of the formula 
X := propagation(pcl = 3 A pc2 = 3 A false, /?) contains the following 

strongly connected set of unreachable states, 

X[3,3]:s/1 = 0 x[3,2]:s/l = 0 x[2,3]:s/l = 0 

X[3,l]:s/1 = 0 x[2,2]:s/l = 0 x[2, 1] : S/1 = 0 



Similarly, we can eliminate the other possibility (pi = 1 A p2 = 0) at control 
location pci = 3 A pc2 = 3. This proves mutual exclusion. We can also use a 
single widening step to obtain an inductive invariant strong enough to prove 
mutual exclusion. Note that it was pointed out in [5] that the computation of 
iy<f).{W{0){(f)) A {pel = 3 A pc2 = 3 ^ false)) terminates in a finite number of 
steps and yields an invariant that proves mutual exclusion. 
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Example 5. Consider the following transitions: 

pc — I I — ^ X := X + 2; y := y + 2; pc := 2; 

pc — 2 I — y X X — 2; y y 2; pc 1; 

with initial state predicate pc= lAx = 0Ap = 0. Assnming that the variables 
X and y are declared to be integers, neither the least fixed point seqnence, nor 
the greatest fixed point seqnence converges. After a few iterations for compnting 
the greatest fixed point, the formnla <f> we obtain is: 

pc = l^(a? = 0Ap = 0)V(x = 0Ap = 4)V(a?>0Ap>8) 
pc = 2 ^ {x = 2 A y = 2) V {x = 2 A y = 6) V {x > 2 A y > 10) 

The predicate > can be replaced by the predicates = and >. Now, the disjnnct j3 

can be chosen asa?>0Ap>8 and it can be conjectnred to be nnreachable. The 
formula propagation(pc = 1 A false, 2) contains the following strongly 

connected set of unreachable states, 

pc=l^a?>0Ap>8 pc — 2^x>2Ay>^ 

Conjunction of the negation of this formula with the original invariant (f) gives 
the following new invariant, 

pc = l^(^r = 0Ap = 0)V(x = 0Ap = 4)V(^r = 0Ap>8) 

pc — 2 ^ {x — 2 Ay — 2)\/ {x — 2 Ay — &)y {x — 2 Ay > 10) 

As before, in this case again widening can also be used to obtain a similar 
invariant. 

5 Related Work and Concluding Remarks 

Early work [12, 10] on generating invariant for sequential programs has been 
extended to the case of reactive systems in [16, 13,5, 11,2]. These methods are 
usually based on the propagation of invariants through the control structure of 
the different components and by combining local invariants of each component 
to construct global invariants of the system. 

Forward and backward propagation using operators SP(^) and ¥P(^) is also 
used in [5] as the basic technique for generating invariants. In addition, over- 
approximations such as the convex hull of the union of polyhedra, are used for 
widening fixed point computations. Our approach differs in that we consider 
simultaneous forward and backward propagation for computing both lower and 
upper bounds of the reachable state sets. These bounds are also used for com- 
puting suitable narrowing and widening operators. The combination of these 
techniques usually yields much stronger invariants. Moreover, our algorithm is 
an any-time algorithm, in the sense that it can be interrupted at any time to yield 
the most refined inductive invariant computed up to the point of interruption. 
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The method of generalized reaffirmed invariance and propagation was intro- 
dnced in [2] and is based on affirming loc^il invariants of the form SP(^(r))(true) 
and propagating these local invariants along all transitions. This process of af- 
firmation and propagation, however, is performed only in the special case when 
all the existential qnantifiers arising in the process are trivial, i.e., when the 
qnantified variables do not occnr in the rest of the formnla; the twos example in 
the introdnction does not possess this property. The techniqne presented in [2] 
also nses information abont the control transition graph, especially knowledge 
abont cycles and how variables are manipnlated in the cycle transitions, to gen- 
erate stronger invariants. In some cases, these stronger local invariants can be 
generated by repeated propagation (in the stronger sense defined in this paper). 
In general, however, the detection of nnreachable cycles is crncial, as ontlined in 
Theorem 1. 

Techniqnes based on abstraction have also been proposed for generating in- 
variants [14,3]. It appears attractive to first create (finite) abstractions for large 
programs and then to nse standard propagation techniqnes to obtain the set 
of states reachable in the abstract system. This set can then be concretized to 
obtain invariants of the concrete system. Abstraction can be cast as a special 
widening strategy in onr procednre. More specifically, let (a, 7 ) be an abstraction 
and concretization pair (Galois connection) for a transition system S = (V, 0,0). 
Let Sa = (Va,0a,^a) dcnotc the abstract transition system. If 

is a least fixed point compntation on the abstract transition system Sa, then one 
obtains a corresponding fixed point compntation with widening on the concrete 
system 

as follows: the formnla is SP(^)('0^*“^'^) (Step (i) of Lemma 2), and 

is '0(0 V 7 (a(' 0 t*))) (Step (ii) of Lemma 2). Now, if |= 7(0a ^) 0^*'^ , then it 

is also the case that |= 7('0a "^^^) ^ K Thns, the fixed point compntation on 

the abstract transition system can be snitably captnred in the concrete system. 
We shall not prove this claim here, bnt refer to [9] for a similar resnlt. 

Note that the set of generated invariants is restricted to the ones expressible 
in the langnage of the theory 3ft. A program that performs mnltiplication by 
repeated addition, for example, never nses the mnltiplication operator, bnt any 
expression that describes the set of reachable states typically wonld nse the 
mnltiplication operator. 

In snmmary, we present a techniqne for generation of indnctive invariants 
nsing a combination of least and greatest fixed point compntations of the forward 
and backward propagation operators. With obvions modifications, the resnlts 
can be nsed to strengthen invariants. Thns, any techniqne for generation of 
invariants, indnctive or not, can be incorporated with the techniqnes in this 
paper. 

Acknowledgements. We wonld like to thank S. Bensalem, S. Owre, Y. Lakhnech, 
J.Rnshby, J. Sifakis, and the referees for their helpfnl comments. 
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Abstract. Model checking has been conceived as a powerful tool for 
hardware, software and protocol verihcation, which has its main appli- 
cation helds in the development of hi-tech and safety-critical systems. 
We present here a completely novel application in the held of univer- 
sity administration processes, in which model checking is applied to the 
verihcation of the coherence of syllabi and to the automated synthe- 
sis/simulation of correct student careers under given requirements. 



1 Motivations and Goals 

Recently the Ralian Ministry of University and Scientihc & Technological Re- 
search, MURST — the Ministry from no^v on — has approved the reform of the 
Ralian nniversity, v^hich v^ill come into force in the year 2001-2002 [MURST2000a] 
The resnlt is a complex hybrid beWeen the Anglo-Saxon and the traditional Ral- 
ian nniversity organizations. In order to forestall the many problems arising from 
the reform, the University of Trento and CINECA ^ have started a joint project, 
named SS2, v^hose goal is to bnild a model of the post-reform strnctnre and ad- 
ministration processes of a nniversity, and to develop a ne^v information system 
for the university admission & examination ofhces. 

One major problem with the reform evidenced by the project model is the 
much increased difhculty of reasoning on regulations, syllabi and student careers. 
For instance, the new autonomy given to the universities in organizing their 
degrees allows them to drastically enlarge the number of degrees of freedom for 
the studenUs career choices. Moreover, the introduction of new distinct didactic 
activity types, and of the different weights in credits, has further complicated 
both the planning and the verification of the student careers. 

To partially cope with this problem, the project model allows for encoding 
the ordinance regulations for student careers as a set of formal rules, so that 
they can be read and understood by an automated device. This allows for an 
automatic verification of the correctness of a career wrt. the rules. 

* The hrst and the third author are part of the SS2 project team. The other team 
members provided feedback on the formalization. Alessandro Cimatti, Marco Roveri 
and Paolo Traverso from ITC-IRST provided help on Symbolic Model Checking. 

^ Italian university consortium for automated computing. 

T. Margaria and W. Yi (Eds.): TACAS 2001, LNCS 2031, pp. 128-142, 2001. 

(c) Springer- Verlag Berlin Heidelberg 2001 
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In this paper we go much further, and we describe how to apply model check- 
ing to solve the much harder problems of automatically planning/synthesizing 
correct student careers under given requirements, and of automatically verify- 
ing the coherence of syllabi wrt. the rules. The ultimate goal is to develop an 
automated support tool, to be integrated with the projecCs information system. 

The paper is organized as follows. In Section 2, we describe in detail the 
domain and the problems of both career synthesis and syllabus coherence veri- 
fication; in Section 3, we show how we encode the latter two problems as CTL 
model checking problems; in Section 4, we describe a prototype tool we have 
implemented; in Section 5, we present and discuss some preliminary empirical 
results; in Section 6, we discuss the ongoing and future work. 

As a notational remark, many of the Italian terms we use do not have a 
straightforward translation into English. This is due to the fact that there is not 
a direct correspondence between the structure and administration processes of 
Italian and Anglo-Saxon universities. (E.g., the meaning of ordinance here may 
not be the same as it is in Oxford or Harvard.) Thus we will define the meaning 
of all non-obvious terms explicitly, reporting the corresponding Italian terms. 

2 The Problem 

2.1 A Model for the Domain 

Didactic activities and syllabi By didactic activity (“attivita didattica”) 
we mean any activity a student can perform to enhance his/her career. Exam- 
ples of didactic activities are, courses, seminars, theses, stages, projects. Most 
didactic activities are courses. Subjects are grouped into subject areas (“set- 
tore scientifico-disciplinare”), whose complete list is published by the Ministry 
[MURST2000b]. The estimated workload related to each activity is measured 
in credits. At the beginning of the year, the student has to register to a new 
matriculation year (“anno di corso”), which is either the successor of the old one 
— if the student is reasonably on schedule — or the old one itself — if the student 
is behind schedule and decides, or is forced, to repeat the year. 

A didactic activity is modeled as a record; the fields which are relevant for 
our discussion are the following: 

- the activity code] 

- one activity type. Possible values are course, seminars, stage, thesis, etc.; 

- one subject area] 

- one weight in credits] 

- the list of the codes of the prereguisite didactic activities] ^ 

- the minimum matriculation year at which one can perform the activity. 

Some examples of courses are given in Eigure 1. 

^ Prerequisites obtained by transitivity are omitted. 
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Code 


Type 


Subject 

area 


Credit 

weight 


Prerequisite 

courses 


Matriculation 

year 


1. 


CSI 


course 


INEOl 


5 


{} 


1 


2. 


CSII 


course 


INEOl 


5 


{C5/} 


2 


3. 


OOP 


course 


INEOl 


5 


{CSII} 


2 


4. 


GEO 


course 


MATOS 


5 


{} 


1 


5. 


AI 


course 


INEOl 


5 


{C5/} 


3 



Fig. 1. Examples of Courses. 

Every year, for every degree course ordinance, the proper entity (faculty, 
department, . . . ) presents a syllabus (“offerta didattica”), i.e., the list of courses 
which are active that year. The other didactic activities (theses, stages, etc.) are 
proposed either by the teachers or by the students themselves. (Eor simplicity, 
from now on we will consider all didactic activities as part of the syllabus.) 

Student careers We see a students career C as an ordered list of didactic ac- 
tivities performed, each tagged with the matriculation year of the student when 
he/she performed it, — which has to be greater or equal than the minimum ma- 
triculation year of the didactic activity. We say that E is on a set of didactic 
activities A if all the didactic activities in C are in A. When matriculating, a 
student is given an empty career and is registered to the first matriculation 
year. Each time a student performs a didactic activity (passes an exam, presents 
successfully a seminar, defends a thesis, etc.), this activity is appended to the 
student career, together with the students current matriculation year. If a di- 
dactic activity is a prerequisite of another, the former must occur before the 
latter. Eor instance, 

C = {{GEO, 1), {CSI, 2), {CSII, 2), {OOP, 2)} (1) 

is a career on the set of courses in Eigure 1. 



Ordinances and Rules When matriculating, a student is associated to an or- 
dinance ( “ordinamento didattico”), which states the regulations for the student A 
career, that is, the goals to achieve and the constraints to satisfy for graduation. 
In the project model this is represented by a set of formal rules for the student 
career. Some examples of rules are given in Eigure 2. Erom a syntactic viewpoint, 
each rule is built on the following components: 

- a quantity, in the form at least\at most TV. TV is an integer, called the bound; 

- a unit of measure, in the form credit s\units; 

- an activity type. The values correspond to those of the didactic activities, 
plus the value any, which matches all values; 

- a scope, given by a scope type, in the form subject areas\courses\all, and a 
scope list, which is a list of elements of scope type. If the latter is all (“any 
scope”), then no scope list is provided. 
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Quantity 


Unit of 
Measure 


Activity 

Type 


Scope Type 


Scope List 


n 


at least 180 


credits 


any 


all 




V2 


at least 90 


credits 


course 


subject areas 


{MAT01,...,MAT08} 


V3 


at least 38 


credits 


course 


subject areas 


{FIS01,...,FIS08, 
INFOl, ING-INF05} 


f4 


at least 10 


credits 


course 


subject areas 


{INFOl, ING-INF05} 


rs 


at least 5 


credits 


thesis 


all 




V6 


at least 9 


credits 


stage 


all 




V7 


at least 2 


units 


course 


courses 


{CSI, CSII} 


rg 


at most 15 


credits 


course 


subject areas 


{INF01,ING-INF05} 



rg 



Fig. 2. Examples of Rules. 

We call a rule explicit if its scope type is courses, implicit otherwise. Explicit 
rules — e.g., rr in Eigure 2 — make explicit reference to course codes, which have 
to be defined a priori wrt. the rules themselves. Implicit rules — e.g., r 2 , rs, 
and rg in Eigure 2 — make no explicit reference to courses. 

Erom a semantic viewpoint, each rule tells a student the amount of didactic 
activities of given kind and scope he/she has to cash into his/her career to achieve 
graduation. Eor instance, the intuitive meaning of the rule set in Eigure 2 is that, 
a student has to cash into his/her career, respectively: 

Ri: at least 180 credits on the whole (in any activities of any scope), of which: 
R 2 : at least 90 credits in courses in the subject area(s) {MATOl, MATOS}, 
R 3 : at least 38 credits in courses in subject area(s) {EISOl, ..., ING-INE05}, . . . 

Notice the usage of the expression “of which” . This applies when the scope and 
activity type of a rule r* are subsets of the scope and activity type of another 
rule Vj, and the bound of r* is smaller than the bound of Vj. (Remarkably, 
explicit scopes are treated as subsets of the scopes given by the corresponding 
subject areas; e.g., ‘‘‘‘courses {C SI ,C Sliy^ is treated as a subset of ‘‘‘‘subject 
areas {I N FOiyS) Some rules with explicit scopes — like, e.g., rr — can be used 
to force the student to perform some courses. We call such courses mandatory , 

We divide the rules into goal rules and constraint rules. We call a goal rule 
any rule in the form “at least N . . . ” . (Typically most rules in ordinances — 
often all — are goals.) A career satisfies a goal rule when the sum of units of 
the activities of the desired type and scope performed is greater or equal than 
the bound of the rule. Eor instance, the career (1) satisfies rule T 4 in Eigure 2, 
as the 2nd, 3rd and 4th courses provide 5 + 5 + 5 > 10 credits in the subject 
area INFOl. The goal rules are not satisfied at the beginning of a career and 
must be satisfied when the student graduates. When a career satisfies a goal, all 
extensions of that career satisfy it. 

We call a constraint rule any rule in the form “at most N . . . ” . (Typically 
very few rules in ordinances — often none — are constraints.) A career violates 
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a constraint rule when the sum of units of the activities of the desired type and 
scope performed is greater than the bound of the rule. For instance, in the career 
(1), the student cannot pass the exam of the sixth course of Figure 1, because 
doing so it would violate rule rg in Figure 2, as the courses 2,3,4 and 5 would 
provide 5 + 5 + 5 + 5 > 15 credits in the subject area INFOl. The constraint 
rules are not violated at the beginning of a career and none should be violated 
until the student graduates. If a career violated a constraint, all extensions of 
that career would violate it. 

Given a career C and a rule set IZ, we say that: 

(i) C violates IZ if it violates one constraint rule Cj ^IZ; 

(ii) C satisfies IZ if it satisfies every goal rule gu ^TZ and violates no constraint 
rule Cj ^IZ. 

C neither satishes nor violates IZ if none of the two conditions above holds. Notice 
that a rule set is interpreted as a conjunction of rules. 

2.2 A Model for the Problems 

Career planning/synthesis A student has to plan his/her career when ma- 
triculating, and possibly to re-plan it every new yearA registration, according to 
the rules he/she is given, the activities available, plus his/her own desiderata. 
(In many universities, such a plan must be presented and renewed explicitly.) 
Desiderata are mainly represented in terms of career elements — e.g., 

{OOP, 2) means “I want to pass OOP at the second matriculation year” — which 
we call self-imposed didactic activities. It is also possible to add to the rules of 
the ordinance self-imposed rules like, e.g., “At most 180 credits any all”. 

Because of the overlapping rule scopes, the different didactic activity types 
and credit weights, and the high number of degrees of freedom, the planning task 
may be rather complicate, and can cause mistakes. Thus, it is highly desirable 
to provide for the students a sort of electronic advisor of studies (EAS), an 
interactive support tool interfaced with the information system which, at each 
step, 

(а) displays the student A current career and status wrt. the rule set and self- 
imposed didactic activities, and the complete list of activities available; 

(б) allows the student to input and edit his/her own set of desiderata, and 
simulate interactively the evolution of his/her career 

(c) can synthesize automatically careers which satisfy both the rule set and the 
desiderata. 

The EAS can be used also by a secretary to verify the correctness of a complete 
student A career wrt. the ordinance rules. 

While the steps (a) and (b) can be implemented with the standard technology 
of information systems, the step (c) can not, and it requires an external tool. The 
kernel of such a tool is an exhaustive search engine which, given the syllabus A, 
the rule set IZ, the current career C, plus the student A set of self-imposed didactic 
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activities A' , searches for a career C\ extending C and inclnding all elements of 
A' , which satisfies IZ. This can be a very hard task, as it may reqnire exploring 
all possible career combinations before finding one, or that there is none. 

Remark 1. A syllabns lists only the conrses which are tanght the cnrrent year, 
and makes no statement abont the fntnre. Thns, any plan for a stndenCs fntnre 
career is necessarily based on the implicit assnmption that the syllabns will not 
be modified in the fntnre years. Lnckily, the conrse list of the syllabns typically 
does not change mnch from year to year and, if it does, changes concern only 
with minor optional conrses. (Mandatory conrses are set by ordinance rnles.) 
In general, in this paper we assnme that syllabi are static. We are cnrrently 
experimenting an implementation with dynamic syllabi. □ 



Verifying the coherence of a syllabus wrt. a rule set Defining the rnle 
set of an ordinance, and defining new syllabi, are hard tasks. In fact, the rnle 
set of an ordinance constrains not only the stndent careers, bnt also the syllabi. 
For instance, if the rnle set defines some mandatory conrses, the syllabi are 
forced to provide snch conrses; e.g., the goal rj of Fignre 2 forces all syllabi to 
provide a conple of conrses with code CSI and CSII, and corresponding snbject 
area INFOl. More generally, syllabi mnst always be coherent with the rnle set, 
that is, they mnst always provide enongh conrses to give to every stndents who 
have never violated a constraint a chance to complete his/her degree conrse and 
gradnate. We formnlate the latter fact as follows. 

Definition 1. Given a rnle set IZ and set of didactic activities A, we say that 
A is coherent wrt. IZ if it is always the case that, for every stndenFs partial 
career C on A which does not violate IZ, there exists at least one career on A 
extending C which satisfies IZ. 

Verifying exhanstively the coherence of the syllabi is in many cases ont of the 
reach of a human mind. Thus, it would be highly desirable to provide a support 
tool able to verify it automatically. 

In general, verifying the coherence of syllabi with the rule sets is by far out 
of the reach of standard technology of information systems, for it requires a 
(double) exhaustive search engine. In fact, the basic step is similar to that of 
the FAS, that is, to search for a new career C' extending C which satisfies IZ. 
As before, this means exploring up to all possible career combinations before 
finding one. Much worse, this must be done for every student \s partial career C, 
which requires exploring up to all possible combinations. 

Remark 2. One may wonder whether Definition 1 is too restrictive. Consider, 
for instance, the case in which the courses OOP and A I in Figure 1 had no 
prerequisite. Then the partial career C = {OOP, AI} would not violate rg, but 
there would be no career O extending C satisfying rr without violating rg. Thus, 
according to Definition 1, the syllabus would not be considered coherent with 
the rule set. On the other hand, one can reply that, given a constraint cj , if 
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we provide a set of didactic activities {ai, a^} containing mandatory courses 
which is big enough to violate Cj, we should also introduce prerequisites to force 
the student to perform all the mandatory courses before violating the constraint. 
We will further discuss the topic in Section 4. □ 

3 Formalization into Model Checking 

The main idea of this paper is to encode and solve the problems of the synthesis of 
student careers and of the verification the coherence of the syllabi wrt. rule sets, 
as CTL model checking problems. For lack of space, we omit any description 
of CTL and OBDD-based CTL model checking, which can be found in, e.g., 
[Clarke et a/. 1986; McMillan 1993]. 



3.1 Career Evolutions as a Finite State Machine 

We represent all the possible evolutions of a students career within a given set 
of didactic activities as a finite state machine (FSM from now on). Consider 
Figure 3. Broadly speaking, a state is characterized by the set of the activi- 
ties performed so far. In the initial state no activity has been performed; each 
transition represents the performance of a new activity, and adds it to the career. 

More in detail, let A be the set of didactic activities a^s of a given syllabus; 
let IZ be the rule set of a given ordinance; let gj and Ck denote the goal rules and 
the constraint rules in IZ respectively. We represent all the possible evolutions 
of a students career within M as a FSM Af . 

The state variables of Af are given by: 

— an array v of booleans, one for each didactic activity a* in M, such that v[i] 
is true if and only if the didactic activity a* has been performed. 

— a bounded integer y G {1, representing the students current ma- 

triculation year; 

— an array b of bounded integers, one for each rule rj in 7^, s.t. b[j] is the sum 
of the credit/unit weights of the activities in the current career which match 
the type and scope of rj . 

— an array of booleans p, one for each a* in M, s.t p[i] is true if and only if a* 
satisfies its matriculation year and prerequisite constraints. 

Notice that b and p are not state variables in the strict sense, as their values 
derive deterministically from the values of v and y. A state is univocally denoted 
by the values of v and y, so that the size of the state space is upper-bounded by 
ymax • ||M|| being the number of didactic activities in A. 

In the initial state of the FSM Af , the matriculation year y is 1; every boolean 
v[i]s is set to false (no activity performed); every bounded integer b[j] is set 
to 0; for each a*, p[i] is set to true if minimum matriculation year is 1 and 
Qi has no prerequisite courses, to false otherwise. 

The transition relation of Af is defined in such a way that 
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Undergraduate 




Graduate 





Fig. 3. The FSM representing all the possible evolutions of a studenCs career 
within a given syllabus. Within each state, “{}” is the set of didactic activi- 
ties performed. (For simplicity, information regarding matriculation years and 
prerequisites is ignored.) 

— one boolean v[i], corresponding to the performed activity a*, passes from 
false to true, the others keep their values. If all v[i]’s are true, then they 
all keep their values; 

— y may either keep its value or be incremented by 1. li y equals its bound 
ymax, then it is not changed. If all v[i]’s are true and y is smaller than the 
bound, it is incremented by 1; 

The other variables are automatically updated from the new values of v and y 
in the following way: 

— for each rule Vj in IZ, if the performed activity a* matches the type and scope 
of Tj, then b[j] is incremented by the weight of a*, otherwise b[j] keeps its 
value. 

— for each a^, p[/?] is set to true if a^’s minimum matriculation year is smaller 
or equal y, and has no prerequisite courses ai s.t. v[/] is false, it is set 
to false otherwise. 

The goal and constraint rules in IZ are the atomic propositions of the CTL 
formulas representing the specifications. Thus, for every state s of A4 and for 
every goal rule and constraint rule Cj in 7^, we say that 

M,s \= gk <S=^ b[Ar](s) > bound(gk) ^ 

Af , s 1= Cj \ h>[^'](s) < bound(cj), 

where b[i](s) denotes the value of b[i] in s. Intuitively, this means that the goal 
rule gk “at least N credits ....” is true in the state s of the FSM Af if and only if, 
in the state s, the amount of credits b[/?] matching the scope and type of gk is 
greater or equal than TV, and that the constraint rule Cj “at most N credits ....’^ 




136 



Roberto Sebastian! , Alessandro Tomasi, and Faust o Giunchiglia 




4 



Fig. 4. The FSM representing all the possible evolntions of a stndentA career 
starting from a given partial career. (For simplicity, information regarding ma- 
tricnlation years and prereqnisites is ignored.) 

is true in the state s of the FSM J\4 if and only if, in the state s, the amount of 
credits b[j] matching the scope and type of cj is smaller or equal than N. 

To handle the case when the evolutions starts from a given partial career 
we modify Af by forcing an initial deterministic behavior until the given 
career C is emulated. This is represented in Figure 4. Let C be the career 
{(aq, yq), ..., (a*^, At the k-th step, k < N , Ai “chooses” determinis- 

tically the didactic activity aq, sets v[q] to true and y to yq , and updates b 
and p accordingly. After the TVth step, Af starts behaving in the usual non- 
deterministic way. To handle the presence of self-imposed didactic activities 
(aqy*)A, we modify Af by forcing the deterministic choice of a* as soon as 
possible when y > yi. ^ 

Property 1 (Monotomcity), Assume true > false. Let s and denote two 
generic states of the FSM Af described above such that is a successor of s. Let x 
and denote the values of the generic variable a? in 5 and respectively. Then we 

always have that, for every i and j ^ y^ P y^ ^ vW, b[i]' > b[i], pW' > pM- 
We say that M is monotomc, □ 

A consequence of the definition of J\4 is that, except for the very last state 
where all v[i]s are true and y equals its bound, it is always the case that y' > y 
or v[/y > v[i] for one i, that is, the monotonicity of Af is “strict”. Thus we have 
no loops except than in s* , which is the successor of itself (see Figure 3). 

From (2) and Property 1 we have that, for every goal rule and constraint 
rule Cj in IZ, 

A \— 9k — c Af , s 1= 

Af , 5 ^ Cj Cj. ^ ^ 



^ This means interpreting {at^yi) as “1 want to perform a* as soon as possible in the 
y*-th matriculation year”. Anyway, as self-imposed didactic activities can be used 
only in career synthesis, this is not a signihcant restriction. 
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This matches the intuitive statements of Section 2.1: when a career satisfies 
a goal, all extensions of that career satisfy it, and when a career violates a 
constraint, all extensions of that career violate it. 

3.2 Encodings of the Different Problems 

Let C —def !\j Cj and G —def /\k9k, where Cj and denote the constraint and 
goals in IZ. Let sq denote the initial state in Af . We represent the different kinds 
of problems described in Section 2.2 as CTL model checking problems by means 
of different specification formulas. 

Career’s synthesis The problem of synthesizing a career matching the rules 
and desiderata is a typical reachability problem: given a set of initial states A, a 
set of goal states F (the states which verify all goals and constraints) and a tran- 
sition relation T, find a path 5 q, such that So ^ A, Sn ^ F and F{si-i, Si) 

holds for every i G {!,..., n}. Following the approach of [Cimatti et a/T998], we 
model the problem as: 

M,so h AG-(CAG). (4) 

The CTL specification formula means “invariantly, at least one rule is false” . 
The model checker tries to verify exhaustively this property, and, when it finds 
a counter-example (a state Sn where all rules are true) it returns a path tt leading 
to it. The property (3) guarantees that, if a constraint is not violated in then 
it is not violated in all the states of the path. Thus a path tt returned represents 
the progression of a career which satisfies all goals and violates no constraint. If 
a breadth-first search strategy is used (as in standard OBDD-based CTL model 
checking), then the length of the career returned is minimal. If the synthesis 
starts from a partial career G , or it contains self-imposed didactic activities, Af 
is modified as described in the previous section. 

Verifying the coherence of a syllabus Following Definition 1, we model the 
problem of verifying the coherence of a syllabus as follows: 

M,so h AG(G^E(GU(GAG))). (5) 

The CTL specification formula means “invariantly, if in a state 5 all constraints 
are true, then there exist a path tt starting from s in which all constraints are 
always true and eventually all goals become true.” The intermediate state s 
represents that of the partial career C in Definition 1, and the path tt represents 
the career prosecution G jC. Because of the monotonicity property (3), (5) can 
be simplified into: 

Af,so h AG(G^EF(GAG)). (6) 

Moreover, as the goals are monotonic (3) and the FSM has no loop except for 
the one in the final state s*, if there is no constraint rule, then (6) can be further 
be simplified into: 

Af,5o h AF G, 

which means “for all paths, eventually all goals become true”. 



( 7 ) 
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Fig. 5. Schema of the prototype tool. 



4 A Prototype Implementation 

The project information system will be ready no earlier than the second half 
of 2001. In the meantime, to verify the feasibility of the approach, we have 
implemented a simple prototype tool interfaced with a new version of the SMV 
symbolic model checker [McMillanl993]. The tool provides a set of commands 
and options to perform the tasks described in Section 3. It takes as inpnt a set 
of didactic activities A, a rnle set 7Z, and, in case of career synthesis, an optional 
partial career C and an optional set of self-imposed didactic activities The 
ontpnt depends of the kind of problem addressed: in case of career synthesis, 
either it retnrns a career extending C and inclnding all elements of A' which 
satisfies IZ, or it fails; in case of syllabns coherence verification, it retnrns true 
if A is coherent wrt. IZ, false otherwise. 

The schema of the tool is reported in Fignre 5. An Encoder generates from 
the inpnt a SMV inpnt file describing the FSM Af and the CTL specification 
formnla, as in Section 3, pins a Table, keeping track of the symbol encodings. 
A Decoder converts the SMV ontpnt file in a readable ontpnt format nsing 
the Table; in particnlar, it converts an ontpnt path of states into a career. 
The search engine of the tool is the symbolic model checker, which is used as 
a blackbox. The choice for an OBDD-based CTL model checker was forced by 
the particular formalization of the problems, as in Section 3, and by the very 
regular structure and high symmetry of the FSM M. 

The tool provides some options to improve its performances. First, we notice 
that a significant source of resources consumption is the presence of lots of 
counters b[j]: OBDD-based model checkers handle with difficulty (bounded) 
integers, as they have to be encoded into their bitwise representations. Thus, if 
an apposite option is set, the tool applies the following reduction: if the greatest 
common divisor (GCD) n of the credit weights of all the didactic activities in 
A is strictly greater than 1, then all the credit values are divided by n by the 
Encoder and the results are re-multiplied by n by the Decoder. This allows 
the model checker to handle smaller integer values. For the goal and constraint 
rule bounds 6A, it is considered respectively the ceiling \b/n] and the floor [b/n\. 
In fact, e.g., if the weights of all didactic activities are multiple of 5, “at least 9’^ 
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is equivalent to “at least 10”, and “at most 7” is equivalent to “at most 5”. We 
call this option, GCD. 

Moreover, the number of possible paths may be huge, and most are groupwise 
identical modulo order permutations. We notice that non-mandatory didactic 
activities cannot be prerequisites for mandatory courses (this fact is checked 
by the Encoder). Thus, one may want to restrict the search to careers in 
which mandatory courses are performed as soon as possible — that is, at the 
beginning of their corresponding minimum matriculation year. Moreover, from 
the students’ viewpoint, non-mandatory didactic activities are the only matter 
of choice in a career, so that, once the mandatory courses are clustered at the 
beginning of their matriculation years, their relative order within each cluster can 
be considered insignificant. Thus, one may want to impose a fixed order among 
the clustered mandatory courses, which is compatible with the prerequisites. 
We call this option. Restrict. Notice that the Restrict option allows the model 
checker to avoid situations like the one highlighted in Remark 2. 

Finally, if an apposite option is set, all rules stating mandatory courses — 
like, e.g., rr in Figure 2 — can be merged into one rule, requiring thus only one 
counter. We call this option. Merge. 

5 Preliminary Empirical Results 

The reform will come into force in the year 2001-2002, so that so far the univer- 
sities have not yet presented their post-reform ordinances and syllabi. Thus, to 
verify the feasibility of the approach, we have modeled as a test-case the proposal 
of post-reform ordinance and syllabus of the Mathematics and Physics degree 
of the University of Trento. ^ Both degree course ordinances require 180 credits 
on the whole, of which at least 5 credits are for a thesis and 9 are for a stage, 
with no scope restriction. Both rule sets contain no constraint rule. The Math 
syllabus offers 55 5-credit courses, among which 22 are mandatory; the Physics 
syllabus offers 39 5-credit courses, among which 25 are mandatory. 

As stages and theses are not explicitly inserted in the syllabi, we have added 
them in two distinct ways: in the first [Math\ and Physi) the syllabi are added 
one 5-credit didactic activity of type thesis and a 10-credit one of type stage; 
in the second {Math 2 and Phys 2 ) the syllabi are added one 5-credit and one 6- 
credit didactic activity of type thesis and one 9-credit and one 10-credit didactic 
activity of type stage; a constraint rule “at most 1 unit [of type] thesis” is added 
in order to cope with the uniqueness of the thesis. 

The results of the empirical tests are summarized in Table 1. (All tests have 
been obtained by running the model checker on a bi-processor PC Pentiumlll 
667MHz 1GB RAM with Debian Linux; the RAM consumption and CPU time 
required by the ENCODER and the DECODER are negligible wrt. those of the 
model checker.) We have considered both the problems of career synthesis and 
syllabus coherence verification. For all problems, we have analyzed all possible 

^ Available at http://www-math.science.unitn.it/CCLM/ (in Italian). As they are 
just proposals, they may eventually change wrt. the current version (October 2000). 
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Table 1. The results of the tests. “???” means “Exceeding 600MB RAM con- 
sumption”. (Here “K” and “M” denote 10^ and 10® rather than 2^® and 2^®.) 



combinations of the GCD, Restrict and Merge options. (For Math 2 and Phys 2 , 
the GCD option is ineffective, so that the values in the left half of the table are 
pairwise identical to those of the right half.) 

On one hand, we notice that the problems are intrinsically very hard: with 
all the options disabled, no problem can be solved within 600MB RAM con- 
sumption, as the size of the state space, and thus of the BDDA, tend to explode. 
The same happens when both Restrict and Merge options are off. On the other 
hand, with all the options set, all problems are well at the reach of the model 
checker, mostly solved in a bunch of seconds. In between, we notice that: 

— with the GCD option on, when applicable [Mathi and Physi), there are 
mostly sensible, but not dramatic, performance improvements. In fact, it re- 
duces the number of boolean variables necessary to encode bounded integers, 
but the reduction is only logarithmic with the value of the GCD; 

— with the Restrict option on, the improvements are very relevant. In fact, 
it reduces the number of non-deterministic choices and restricts the search 
only to non-mandatory didactic activities, causing a very relevant reduction 
of the size of the state space. 

— with the Merge option on, there are mostly relevant performance improve- 
ments. In fact, it allows for a reduction of the number of the counters — i.e., of 
the boolean variables encoding it — and reduces the size of the corresponding 

BDD. 
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Fig. 6. Results obtained with Mathis GCD=on, Merge=on and Restrict=off, 
adding either a partial career or a set of self-imposed didactic activities of increas- 
ing size. Y axis: (Left) CPU time in seconds; (Right) number of BDD nodes. X 
axis: number of didactic activities inserted. 

We notice that Math 2 and Phys 2 are always harder or much harder than Mathi 
and Physi respectively. The motivations for this fact are manifold: first, we 
have two more non-mandatory didactic activities; second, the new 6-credits 
and 9-credits didactic activities hinder the applicability of the GCD option 
(^cd(5, 6, 9) = 1); finally, for the syllabus verification problem, with Mathi and 
Physi we use the simplified encoding (7), while with Math 2 and Phys 2 the 
constraint rule added forces the usage of the much harder encoding (6). 

As a side observation, we also notice that the problem of career synthesis is 
always easier or much easier than the corresponding problem of syllabus verifica- 
tion. In fact, when both problems have solution, the latter requires an exhaustive 
exploration of the whole search space, while the former can stop when it finds 
one path. 

For the problem of career synthesis, we wonder how the efficiency changes 
when we add partial careers and self-imposed didactic activities. To provide 
an intuition, in Figure 6 we have taken the problem Mathi with GCD=on, 
Merge=on and Restrict=off, and we have added either a partial career or a set 
of self-imposed didactic activities of increasing size. Both the CPU times and 
number of OBDD nodes decrease significantly with the size of both the partial 
career and the set of self-imposed didactic activities. In fact, as with Restrict=on, 
the pre-compiled presence of the input didactic activities in the career reduces 
the number of non-deterministic choices and restricts thus the search, causing a 
relevant reduction of the size of the state space. 

6 Ongoing and Future Work 

The currently implemented prototype is actually more complex than we have 
described. First, it can handle multiple choices for the credit weight of one di- 
dactic activity. E.g., for theses, this prevents introducing extra constraints like 
the one in Section 5, which is the main source of the performance gaps between 
Mathi, Phisi and Math 2 , Phis 2 . Second, it can also handle dynamic syllabi. 
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This feature, which is still at experimental level and has to be further investi- 
gated, requires handling some extra information, like the range of years in which 
an exam can be passed and the equivalence relations among courses. The key 
future research step, however, will be to pass to a direct integration with a sym- 
bolic model checker instead of the current blackbox usage. This should allow 
for customizing the model checking engine to exploit the very peculiar features 
of the FSM, and for improving the level of interactivity of the career synthesis 
process. 

We may wonder whether our approach is the right one for this application. 
For the problem of syllabus coherence verification, we need a language which, 
like CTL, is expressive enough to represent boolean and temporal information 
as well as constraints on bounded integers, and we need a search technique 
which, like OBDD-based Model Checking, is able to exploit the very regular 
structure of the FSM to cope with the combinatorial explosion of the state space. 
Therefore, although we cannot exclude a priori the existence and effectiveness 
of other approaches, we believe that symbolic model checking is a natural and 
very effective way to encode this problem. 

For career synthesis, the problem is significantly simpler, and we are con- 
sidering also other approaches like planning combined with linear programming 
[Wolfman & Weldl999] and SAT-based model checking [Biere et a/. 1999], with 
some tricks from [Giunchiglia et a/T998] to exploit the dependencies among the 
state variables of the FSM. 
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Abstract. Vortex is a workflow language to support decision making 
activities. It centers around gathering and computing attributes of in- 
put objects. The semantics of Vortex is declarative, and the dependency 
graphs of Vortex programs are acyclic. This paper discusses the appli- 
cation of symbolic model checking techniques to verihcation of Vortex 
programs. As a case study we used a Vortex program MIHU for online 
customer support. The control structure and the declarative semantics of 
Vortex programs enabled us to develop various optimization techniques 
for the purpose of verihcation. These techniques include constructing a 
disjunctive transition BDD, variable pruning, projection of initial con- 
straints, and predicate abstraction. 



1 Introduction 

A v^orkho^v management system provides a mechanism for organizing the exe- 
cution of multiple tasks, typically in support of a business or scientihc process. 
A variety of v^orkho^v models and implementation strategies have been proposed 
[8,19]. Workhov^s concentrate on the control and coordination of tasks performed 
by softv^are systems or humans. Workhov^s are typically represented using some 
form of directed graphs [8,6], often based on variations of Petri nets. A v^orkho^v 
specihcation describes both data and control hows between tasks as well as the 
application programs that implement the tasks. In contrast, recent work in the 
area of scientihc workhows emphasize the need to promote a datahow view of 
the workhow at the specihcation level. [1] presented an “object view” where the 
focal point is the data used and generated during workhow execution. There, 
workhows are considered as graphs of objects with the processes that created 
them being expressed through the links between them. A mixed view is pro- 
posed in [14], wherein both data and control how are expressible and the user 
can navigate from an activity to its input data structure. 

Many workhow applications require highly differentiated treatments for dif- 
ferent kinds of inputs. A substantial class of examples arises in customer care 
(e.g., e-commerce, insurance claims processing) where enterprises attempt to 
provide goods and services to a mass market. Such individualized treatments 
can cater to the individual preferences of customers, and can support targeted 
marketing initiatives and promotions by the host enterprises. This is especially 
important in connection with establishing and maintaining a loyal customer base, 
a cornerstone to success in business in general and e-commerce in particular. 
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To support this need, Vortex [10] is developed recently and focuses on de- 
termining attribute values, and the means by which these values are obtained 
are modeled essentially as side-effects of this. However, attributes in Vortex can 
either model the execution status of some task or the output data of a task. 
This approach offers a lot of flexibility, which enables to alternatively focus on 
tasks or data as needed by the decision process incarnated by the workflow. 
Vortex enables a declarative specification of workflows, which matches the need 
for high-level specification languages for designing and prototyping workflows 
usable by non computer trained users (e.g., a natural scientist) emphasized by 
existing work on scientific workflows. 

Increasingly more people are getting information and services online, many 
of which are supported by workflow systems. Failure of these systems will have 
potentially a huge impact (e.g., the server attacks on CNN and other Internet 
sites some time ago). Moreover, workflow specifications (programs) are becom- 
ing more and more complex. For example, a Vortex workflow program [10] in 
practical use may consist of hundreds of variables and thousands lines of code. 
It is unlikely that one can develop large workflow systems free of errors with no 
tools. An interesting issue here is to develop appropriate tools to aid the design 
of workflow specifications. Good design tools can not only improve the quality 
of workflow specifications and but also improve the design and maintenance pro- 
cess. Verification techniques can allow the designer to “debug” the specifications 
of workflow processing logic. Among important properties of workflow specifica- 
tion are logical properties such as each insurance claim is eventually approved 
or disapproved, existence of nondeterministic behaviors, and properties related 
to tasks with side effects (such as issuing a check). 

Research on model-checking produced tools such as SMV [13] has been suc- 
cessfully applied to verification of control-intensive systems [12]. Symbolic model 
checking has been used in verification of systems with up to 10^^^ states. How- 
ever, large number of variables, complex program logic, or arithmetic operations 
can easily exceed the capabilities of verification tools such as SMV. The control 
structure and the declarative semantics of Vortex programs provide opportuni- 
ties for various optimizations which can result in scalable verification techniques. 

In [15], modeling checking was applied to verification of Mentor workflow 
specifications. More specifically, the focus is on properties over graph structures 
(rather than execution results). A similar approach was taken using Petri-net 
based structures in [18]. A technique for translating business processes in the 
process interchange format (PIF) to CCS was developed in [17] which can then 
be verified by appropriate tools. Clearly, a direct verification that considers not 
only the structures but also the executions is more accurate and desirable. This 
is the focus of the present paper. 

In this paper we present techniques such as constructing a disjunctive tran- 
sition HDD, variable pruning, projection of initial constraints, and predicate 
abstraction for verification of Vortex programs. As a case study we use a Vortex 
program “May-I-Help-You” (MIHU), a system to improve the effectiveness of 
web-based storefronts. MIHU has over 40 integer attributes and consists of more 
than 800 lines of Vortex code. A straightforward mapping of the MIHU pro- 
gram to SMV results in a HDD of size too large to be computed. By introducing 
execution order on Vortex programs we were able to construct a much smaller 




Verification of Vortex Workflows 



145 



disjunctive transition BDD in 10 seconds. However, even using the disjunctive 
transition BDD we were not able to check all the properties of the MIHU pro- 
gram. Based on the dependencies among attributes we were able to develop a 
variable pruning technique which only preserves the variables that are active 
during the computation. This technique is motivated by the acyclic nature of 
dependencies in Vortex programs and can be applied to other such systems. Vari- 
able pruning requires the projection of the initial image to different stages of the 
computation. This also reduces the size of the initial image, since complicated 
predicates on various attributes, such as sorted arrays, are decomposed. Using 
these techniques we were able to check all the properties of the MIHU program 
using SMV. 

The remainder of the paper is organized as follows. § 2 reviews necessary 
concepts of Vortex. § 3 describes a Vortex application system “May I Help yoU’^ 
(MIHU), which we use as a case study. § 4 gives the mapping from Vortex to SMV 
and compares two different approaches to construct an efficient transition BDD. 
§ 5 presents two optimization methods: variable pruning and decomposition of 
initial constraints. § 6 is an experimental exploration on predicates abstraction, 
which sheds a light on solving the exponential problem size over integer width. 

2 Vortex Decision Flows 

Vortex [10] is a programming paradigm for online decision making, an impor- 
tant component in workflow systems. A Vortex program focuses on information 
gathering, decision making, and launching and monitoring of external tasks. 
The Vortex language is declarative and it allows programmers to specify the 
conditions under which decisions and tasks should be performed, but the flow of 
control is not specified explicitly. As a result. Vortex programs are more succinct 
and easier to analyze formally than equivalent procedural programs, and are eas- 
ier for humans to understand and modify. Because the core semantics of Vortex 
is declarative, analysis is simpler than with procedural workflow languages. 

A Vortex decision flow consists of a family of attributes that may be evaluated 
during execution. One attribute will be “target” and embody the output of 
a decision flow, e.g., what priority of service to give this customer, or what 
promotional image to display on the next web page. Other attributes correspond 
to intermediate results of the decision flow. For example, a “promo hit list” 
attribute might hold a listing of potential promo messages to display, along 
with scores combining the likelihood that a customer will buy the promo and 
the potential profit that might be derived. Some intermediate attributes might 
gather data from external sources, such as databases. Since attribute evaluation 
can have a real cost, enabling conditions are used to decide which attributes 
should be evaluated. The set of data flow and control flow dependencies in a 
decision flow must be acyclic. This and the enabling conditions restrict Vortex 
decision flows to be “monotonic” : once an attribute obtains a value the value will 
not be changed before the end of execution. This “attribute-centric” perspective 
of decision flows permits a systematic approach for specifying what factors to 
be incorporated as a decision is being made. 

Individual Vortex programs are centered around the decision making and 
processing needed to react to a single event, e.g., a new claim input to an in- 
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surance claims workflow, or a customer contact through a web-based storefront. 
Programmers specify what tasks might potentially be performed for an incom- 
ing event, and specify logical conditions under which these tasks should be per- 
formed. A typical Vortex application will involve several Vortex programs, each 
for dealing with a different class of events. 

Programs in Vortex focus on how values should be assigned to the attributes 
of an input object. In particular, each Vortex execution begins with values only 
for the source attributes. As execution continues values for additional attributes 
are obtained, perhaps by computation or synthesis based on previously obtained 
attribute values, by information retrieval, or by interaction with humans or other 
software systems. Not all attributes need take values. External actions (e.g., issue 
checks) may be launched as a side-effect of attribute evaluation. 

A Vortex program includes enabling conditions to determine whether an at- 
tribute will be evaluated for a particular execution. These are similar to the 
enabling rules in Meteor [11] with a crucial difference: the enabling rules of Me- 
teor explicitly mention events, and can be fired only if the mentioned event (s) 
occur and the remainder of the condition is true at that time. Thus, an analysis 
of tasks and enabling rules in Meteor requires an understanding of the relative 
timing of tasks executions. In contrast, the assignment of attribute values in 
Vortex is monotonic (i.e., once assigned an attribute value cannot change), and 
the enabling conditions refer only to attribute values and states (i.e., whether 
an attribute has been enabled or disabled). This makes it possible for Vortex to 
have a simple declarative semantics that ignores issues around order of execution 
except for those implied by data flow constraints between tasks. 

Thus the computation of an attribute value in a Vortex program may de- 
pend on values of other attributes [data dependencies) and the execution of the 
computation depends on attributes occurring in the enabling condition [control 
dependencies) . 

The attribute-centric paradigm and assumption of monotonicity make pos- 
sible a natural mechanism for querying the status and history of workflow pro- 
cessing. This is based on the notion of snapshot of a Vortex execution. Suppose 
a Vortex program is being executed. At a given point in time the snapshot as- 
sociated with this execution is a mapping from attributes to the current state 
of the attribute (not-yet-considered, enabled or disabled), and if the attribute is 
enabled either a value for the attribute or the “value” uninitialized. Once an 
execution completes then its snapshot can be archived. 

Although attribute computation in Vortex can be specified in several dif- 
ferent ways including by external systems (black boxes), an interesting class 
of computation is specified by decision modules. These provide an eclectic mix 
of mechanisms for aggregating and synthesizing previously obtained attribute 
values. As a very simple illustration, suppose that multiple vendors are being 
considered in connection with a given purchase. Different factors might need to 
be weighed, e.g., the price quoted by different vendors, the availability date, pre- 
vious history with the vendor, and etc. A Vortex decision module (or family of 
decision modules) might be used to associate a weight to each vendor and then 
pick the vendor with highest weight. As a simplified illustration, a rule such as 
“If vendor V gave lowest price quote, then contribute [V, 10]” can be used to 
contribute a weight of 10 in favor of V; and the total weight for V would be ob- 
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tained by adding this with the other weights contributed for V. Vortex supports 
a broad family of simple and intricate semantics for combining the contributions 
of the rules in a decision module. Decision modules provide a simple mechanism 
for using a broad variety of heuristics when combining information. This is useful 
in contexts that involve business considerations, e.g., customer care, automated 
resolution of exceptions, and reconciling dirty data. 

3 A Vortex System “May I Help yoU” (MIHU) 

Need for online decision making arises frequently in electronic commerce applica- 
tions. For example, many web storefronts have human agents for online customer 
support, in case when customers feel lost while shopping at the website. How- 
ever in such an application the web storefront has to automatically decide who 
needs the service and when it is appropriate to prompt a customer to launch 
the service. A decision process behind the web server can track each shopping 
session, collect data, analyze the status of a customer, and make decisions to 
provide online support. MIHU is such a Vortex workflow that runs behind web 
server to improve customer satisfaction. 

The purpose of MIHU is to make the decision whether to provide a cus- 
tomer service called AWD (Automated Web Dialog). The target attribute is a 
boolean attribute offerJtWD, as shown in Fig. 1. Each time a web page is ac- 
cessed, MIHU will be called by the web server before the page is delivered to 
the customer. In each run of MIHU, it will monitor several key attributes of 
a customer (e.g. the business value, the frustration score, and the opportunity 
score of the current session, and the current agent load of the web store). The 
goal is to respond as soon as possible whenever the customer’s frustration level 
becomes high or the customer has potential to buy more. As shown in Fig. 1, 
MIHU will also compute some intermediate attributes such as AWD .score and 
AWD _override.s core during its execution. The target attribute offerJtWD is 
computed using AWD .score, AWD .override.s core and agent JLoad. Note that 
there are many source attributes in the program, such as card.color, log, 
and shopping.count etc. The source attributes are passed by the web server 
or fetched from databases. The control and data dependencies between all at- 
tributes are also shown in Fig. 1, where the graph is acyclic as explained in 
Section 2. 

One characteristic of Vortex programs is that they are heavily “control in- 
tensive.” A great proportion of a Vortex program consists of case-statements 
which are specified as conditions on attributes. Let us take a look at a module 
in MIHU which computes the attribute f rustration^ggregate. 

MODULE compute_f rustrat ion_aggregate{ 
enabling condition: true; 
computation: f rustrat ion_aggregate= 
eval_rules ( 

//if no rule is true, the default value is 0 
policy: max_of _true_rules (0) , 
rules : { 

if (sorted_vector [1] >= 100) then contribute 

2* (sorted_vector [0] +sorted_vector [l]+sorted_vector [2] ) ; 

if (sorted_vector [0] >= 100) then contribute 
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(sorted_vector [0]+sorted_vector [1] +sorted_vector [2] ) ; 

if (sorted_vector [0] < 100) then contribute 

(sorted_vector [0]+sorted_vector [1] /2+sorted_vector [2] /4) ; 



There are three case statements in the module, referred as rules in Vortex. 
The three cases are all evaluated, and the maximum result value is assigned to 
the attribute f rustration_aggregate. If none of the cases are satisfied, the at- 
tribute is assigned a default value. Note that each Vortex module has an enabling 
condition (e.g. the above module has an enabling condition which is true). 

A collection of properties were proposed on MIHU, so that commercial policy 
and integrity can be ensured. Some of them are listed in the following. 

1. AWD should not be provided when there are no human agents for support. 

2. MIHU should not provide AWD when one AWD has already been launched. 

3. Do not show AWD to any user who has been idle for two hours or more. 

4. No AWD should be displayed within the first three pages in any session. 

We define P as: there are no human agents for support, or at least one AWD 
has already been launched, or the user has been idle for more than two hours, or 
no more than three pages has been displayed. If P is false, then the following 
properties should hold. 

5. MIHU should guarantee that the frustration score of any customer is below 
90, or an AWD should be launched. 

6. When a customer has more than 5 items in his shopping cart, an AWD frame 
should be launched (to encourage shopping). 
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7. When a customer has searched the same item for over 5 times, an AWD 

frame should be launched to help him. 

These properties can be expressed as invariants using the Vortex attributes. 
For example, property (1) can be expressed as AG( I sEndOf Execution -> ( 
agent^vailable==0 -> !offerJt¥D) ), using the CTL operator AG. 

4 From Vortex to SMV 

It is not unusual for a Vortex workflow system to have many integer variables 
(e.g., more than 50). Assuming 50 integer variables with 16-bit integer width, 
the state space would exceed 10^^^. Explicit exploration on such a huge state 
space would be computationally very expensive. On the other hand, BDD-based 
symbolic model checking can cope with large state spaces, given that they can be 
represented compactly using BDDs. It is true that in the presence of nonlinear 
constraints BDD representation is not efficient. However a large class of workflow 
and decision flow used on applications is linear. MIHU is such an example. 
Therefore we chose the BDD-based symbolic model checker SMV to investigate 
the feasibility of automated verification of Vortex programs. 

4.1 Straightforward Mapping 

Specification constructs provided in SMV input language enables a straight- 
forward translation of the Vortex programs. Since SMV does not have global 
variables, a module named Attributes is set up to hold attributes, and passed 
as an argument to each computation module. Any attribute in Vortex program 
is mapped into one or a set of boolean variables, depending on whether it is a 
boolean or an integer attribute. Also for each attribute attvi, a boolean variable 
attri .enabled is set up to represent the status of attvi. Notice that since integer 
attributes are translated into a set of boolean variables in SMV, the arithmetic 
computation over them should be translated to a series of logical relations over 
those boolean variables. Since SMV’s mapping of integer variables is inefficient, 
we used a macro developed by Chan for this purpose [4]. 

As shown in Fig. 2, for each module in Vortex, the mapped SMV module 
contains two assignments, one for the attribute, and the other is for the status 
variable. The two assignments are both case-statements, and each rule in Vor- 
tex modules corresponds to a case of the case-statement. The construction of 
the condition for each case varies according to different rule policies provided in 
Vortex language such as f irst jrule_win and maxjrule_win. The methodology 
to build the condition is the same. For example, if the policy is first jrule_win, 
the condition for each case in the SMV case statement is constructed by conjoin- 
ing the enabling condition of the module with the rule condition and another 
condition called anc_enabled. The boolean condition anc_enabled is defined as 
the conjunction of all status variables of attributes that are direct predecessors 
of the attribute that is being computed in the dependency graph. Another ex- 
ample on policy maxjrule_win is shown in Fig. 2. Notice that the first case states 
that when the attribute has already been assigned its value, it should keep that 
value. This helps reduce the size of transition BDD. 
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// anc_enabled: all direct predecessors of attr have been computed 

// IsMax(res): among all available results, res is the largest. 

// It is defined as follows: 

// (condl ->(res>=rl)) & (cond2 ->(res>=r2)) (condn ->(res>=rn)) 

Vortex Module SMV Module 



MODULE compute_attr 
enabling condition: e_cond; 

attr=eval_rules{ 

policy : max_rule_wins , 

rules : { 

//rulel 

if condl then contribute rl 



MODULE compute_attr 



ASSIGN 

next (attr) := case 

//if already assigned, keep the value 
attr_enabled: attr; 



//rule 1 

e_cond & anc_enabled & condl & 
IsMax (rl) :rl ; 



//rule2 

if cond2 then contribute r2 



//rule n 

if condn then contribute rn 



} 

} 



//rule 2 

e_cond & anc_enabled & cond2 & 
IsMax(r2) :r2; 

//rule n 

e_cond & anc_enabled & condn & 
IsMax(rn) :rn; 
esac ; 



//now for status variable 
next (attr_enabled) : = case 



e_cond & anc_enabled: 1; 

1: 0; //default 



esac ; 



Fig. 2. SMV Translation of a Vortex Modnle 



4.2 Sequential Execution 

Unfortunately experimental results have shown that the straightforward map- 
ping is inefficient. It took more than 24 hours of CPU time to build the transition 
BDD of 10-bit integer width for MIHU even without verifying any properties. 
The reason is that, given a BDD variable order in which all integer bits are in- 
terleaved, building a monolithic transition BDD for multiple computations can 
be exponentially expensive. For example, if there are two computation c := a-\-b 
and / := d -h e that can be executed in parallel, then there are two options to 
build the transition BDD. One is to build the monolithic transition BDD that 
incorporates both computations. The other is to introduce a sequence number, 
and split the computation in two steps. If the integer width is 16, given an inter- 
leaved integer variable order, a single addition BDD is 131 nodes, the transition 
BDD by first approach is 506 nodes, and the BDD by second method is 266 
nodes. Obviously, the second approach, to split the computation in sequential 
steps, is better. 

It might be argued that if we rearrange a, b, c and d, e, f in two groups , 
and interleave them separately, a much smaller transition BDD can be generated 
for the first approach. It is true for this little example. However it is not appli- 
cable to Vortex programs. Since in a Vortex program, all attributes contribute 
to the target attribute, any two attributes are logically related. This prevents 
a grouping of variables based on the dependency. Therefore, a natural BDD 
variable order for Vortex programs is to interleave all integer variables together. 
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For such a BDD variable order, the straightforward translation fails because it 
incorporates too many computations in one step. 

In our straightforward mapping, the set of status variables for attributes dic- 
tates the execution order. Each snapshot (the values of status variables) will 
decide which module can be executed. However, multiple modules can be exe- 
cuted in one step, which finally leads to an unreasonably large transition BDD. 
Our improved approach is quite simple. Just replace those status variables with 
a sequence variable explicitly, and make sure that at each step, only a small 
number of computations can be executed. Generally, we assign one sequence 
number for each Vortex module. If a module contains too much computation, 
then several sequence numbers can be assigned to a single module. 

Fig. 3 shows the structure of the transition relation BDD using sequence 
numbers. The SMV translation in Fig. 3 is similar to the one in Fig. 2 except 
that the conditions on status variables are replaced by conditions on sequence 
numbers. 

SMV Module 

// suppose the sequence number assigned 
// to this module is 20 
MODULE compute.attr 
ASSIGN 

next (attr) : =case 

//keep the value, if it is already assigned 
seq>20 : attr 
// case 1 

seq==20 & enabling.cond & condl: resl; 

// case 2 

seq==20 & enabling.cond & cond2: res2; 



// case n 

seq==20 & enabling.cond & condn: resn; 
esac ; 

Fig. 3. Disjunctive Structured Transition BDD 

As shown in Fig. 3, the transition BDD has a disjunctive structure. So the 
overall transition BDD size is at worst the sum of all these submodules. Notice 
that these submodules can share components, which would further reduce the 
overall transition BDD size. Given a fixed integer width, if all the computations 
are linear, the total transition BDD size is linear on the program size. It can also 
be proved that the size is linear on the integer width. 

Because of the declarative semantics of Vortex programs, if an execution 
sequence satisfies the partial order defined in the dependency graph, the final 
snapshot of all attributes of SMV translation will be the same as the original 
SMV program. Note that all properties are expressed over the hnal values of 
attributes. It is clear that restricting to a particular execution sequence does not 
affect the verification. 



binary variables that encode 
the sequence number 




4.3 BDD Variable Order 

The BDD variable order we use in mapping Vortex programs to SMV modules is 
shown in Fig. 4. The first part is all the bits of the sequence variable seq which 
holds the current sequence number. This part leads to the disjunctive transition 
BDD structure. As shown in Fig. 4, in the second part we list all the boolean 
variables that correspond to boolean attributes in Vortex. Finally, the third part 
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corresponds to the boolean variables that encode the bits of integer attribntes. 
Note that if the ranges of integers are not the same, then we can interleave 
starting from the least significant bit. 



sequence number 



seq_bit 1 
seq_bit2 

seq_bit 16 



boolean attributes 



bool.varl 

bool_var2 

bool_varn 



interleaved integers 



^ int_varl_bit 1 
int_var2_bit 1 

int_varn_bit 1 
int_varl_bit2 
int_var2_bit2 

\ int_varn_bit2 



hitl layer 



hit2 layer 



int_varl_bit 16 
int_var2_bit 16 

^ int_varn_bit 16 



hitl 6 layer 



Fig. 4. BDD Variable Order 

In Fig. 4, the ordering of variables within each bit layer can affect the size of 
transition BDD. In any snbmodnles, there are two kinds of compntations: the 
“main compntation” of that step, and the assignments to preserve old valnes for 
some attribntes. If there are too many snch attribntes, preserving assignment 
can also resnlt in qnite a large BDD, especially when these attribntes were 
mixed with the attribntes for main compntation. It is better to move these 
nnrelated attribntes ontside the gronp of attribntes nsed in main compntation, 
as illnstrated in Fig. 5. (In Fig. 5 the edges toward BDD node 0 are all omitted. 
Attribntes nl, n2 are nnrelated attribntes for the main compntation; and rl, r2, 
r3 are attribntes nsed in compntation. The attribntes with qnotation (e.g. r3^) 
are next valnes.) Therefore, henristically within each bit layer any attribntes 
directly related in dependency graph shonld be placed close to each other to 
minimize the number of unrelated attributes in the main computation for each 
module. 

Using the ideas presented above the transition BDD for MIHU Vortex pro- 
gram can be built in 10 seconds for 10 bit integer representation. This is a 
significant improvement from the straightforward mapping. 



5 Optimization 

Although the techniques presented in Section 4 enabled us to construct a BDD 
representation for the Vortex program in a reasonable time, the SMV model 
checker was not able to verify most of the properties presented in Section 3 using 
this transition BDD. In this section we present two techniques which enabled us 
to verify all the presented properties. 
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Main Computation : r3^ := (rl A r2) V (-irl A -t2) 




Fig. 5. Effects of Different Variable Order in Bit Layer 

5.1 Variable Pruning 

According to the acyclic property of Vortex, there shonld be no loop in the execn- 
tion of any Vortex program. This means that every attribnte has a “lifespan” ,i.e., 
before its first reference and after its last reference, an attribnte is of no nse for 
the compntation. Note that for intermediate attribntes, “first reference” is their 
definition; for sonrce attribntes, “first reference” is the first time they are nsed in 
compntation. Natnrally an attribnte can be prnned ontside of its life span - we 
can treat snch variables as “dond cares” by assigning them any possible valne 
nondeterministically. This wonld greatly rednce the “preserving assignments” 
cost in sub-BDDs for each module. For example, there are 40 integers in MIHU, 
so in many modules, there would be over 30 preserving assignments. But after 
we pruned unnecessary variables by nondeterministic assignment, at any time, 
there are only less than 9 preserving assignments. For image computation, the 
improvement is even much greater. There is no need to represent the state of 
inactive variables; at any time, only those active variables will be represented in 
image. So actually, the problem size is reduced from a huge state space over all 
attributes to the state space over the variables that are active. Usually the num- 
ber of active variables is much smaller than the total number of variables, e.g., in 
MIHU the maximal number of active variables during any part of computation 
is 9. Therefore, after pruning inactive variables, we can achieve a much better 
performance than the original naive method. 

5.2 Decomposition of Initial Image 

Similar pruning can be applied to source attributes. They can be assigned any 
value nondeterministically outside their lifespan. What is more, the initial con- 





154 



Xiang Fu, Tevfik Bull an, Richard Hull, and Jianwen Su 



straints can be projected to different stage of execntion, which reduces the size 
of initial image drastically. 

The process to project initial image is as follows. Suppose that the initial 
constraint is expressed in the conjunctive form of Af{xi,Xj), where f{xi,Xj) is 
the logical relation between two attributes. Then for each f{xi, xj), suppose the 
lifespan for the two attributes are [a*, 6*], [aj, bj], and a* > aj. We can map the 
f{xi, Xj) into the module of step a* — 1. To do this, transition f{next{xi), xj) or 
f{next{xi), next{xj)) (if a* == aj) is added to module a* — 1. 

For example, suppose we have an initial constraint over array log[ ] that 
log must keep increasing, i.e. log[0] < log[l] < log [2] < log [3]. Suppose 
that log [2] is first referenced in module 5, and log[l] is first referenced in 
module 7. Then the constraint log[l] < log [2] can be mapped into module 6. 
The program is presented as follows. 

MODULE No. 6 
//orginal assignments 

TRANSITION 

next (log [1]) < log [2] 

Notice that the “next” indicates that the next value of log [1] should satisfy 
this constraint in module 7. However the value of log [2] will not change, since 
it is already assigned before module 5. 

Since the constraints over arrays can commonly arise in practical applica- 
tions, (e.g., the date stored in history log should always keep increasing), this 
kind of constraints brings great trouble for initial image, and lead to quite large 
initial image size. We have three such kind of constraints over arrays in MIHU, 
which leads to an initial image of 65,000 HDD nodes, when integer width is 
10. The time used to compute the first forward image computation takes more 
than one hour. After the decomposition optimization applied, the initial image 
shrinked from 65,000 HDD nodes to 4,000 nodes, and the first forward image 
computation reduces drastically to only half a second. 

5.3 Execution Sequence 

As we discussed above, the size of active variable set critically affects the cost 
of verification, i.e., the verification time would increase exponentially over the 
size of active variable set. The smaller the active set, the smaller the verification 
cost. Note that different execution sequences lead to different active variable set, 
e.g., the size of the largest active set in MIHU ranges from 9 to 17 according 
to different execution sequences. Therefore, it is important to find an optimal 
execution sequence which minimizes the largest active variable set. 

We formalize the notion below to be used in our discussion. Let G be a given 
dependency graph. 

Execution sequence: suppose the total number of attributes is n. For any 
attribute attr, it is mapped to a unique integer in range [l..n]. The mapping 
should satisfy the partial relation defined in dependency graph G. 

Lifespan: For each attribute attr, its lifespan is a range [i, j], where i is attr’s 
sequence number assigned in execution sequence, j is the maximum sequence 
number among all its successors. 
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AVSi (Active Variable Set at step i)\ the set of variables active at step 
i.e., AVSi — {attr | i G attr’s lifespan}. 

Minimum Active Variable Set (MAVS) problem: Find the execntion seqnence 
which minimizes the maximnm valne for |AW* |, 1 < i < n. 

Thongh the worst case complexity is exponential, a branch and bonnd al- 
gorithm can be nsed to solve the MAVS problem. Since the MAVS problem is 
actnally to find ont an optimal topological order for a directed acyclic graph, dnr- 
ing the depth-first search to ennmerate all possible topological seqnences, we can 
set a bonnd on the optimal solntion cnrrently available and prnne branches that 
exceed the bonnd. In addition, greedy search can be started first to find a “not 
bad” valne to initialize the bonnd. Unfortnnately, polynomial time algorithms 
for MAVS seems nnlikely as we can show that the problem is actnally NP-hard 
by a redaction from a known NP-complete problem called register sufficiency 
problem [7]. 



5.4 Experimental Results 

We were able to check all properties for the MIHU example given in Section 3 us- 
ing the techniques discussed above. The hardware we used is a Pentium 450MHZ 
PC with 128MB memory. Properties 1 and 3 were not satisfied. The errors were 
due to missing bounds on some of the attributes. For example, AMD .score was 
not bounded in the program which resulted the system assigning an agent to a 
customer though none were available. The experimental results are as follows. 
Integer attributes: 40 

Source code in Vortex program MIHU: 800 lines 
Maximal active variable set: 9 attributes 



Integer Width 


Time (Seconds) 


Transition BDD Size 


Memory Used (Mb) 


10 


250 


269,241 


32 


11 


360 


312,548 


37 


12 


560 


356,936 


39 


13 


900 


401,907 


40 


14 


1500 


446,821 


42 


15 


4500 


539,843 


45 



We verified the properties using forward state exploration. First we gen- 
erated all the reachable states, and verihed all the invariants at once. In the 
MIHU example the most costly computation is the sorting operations to com- 
pute sorted-vector. It could be possible to improve sorting in such cases using 
predicate abstraction. 

6 Predicate Abstraction 

It is not unusual for the complexity of verification to increase exponentially with 
the integer width. None of the optimization approaches mentioned above can 
solve this problem. However, the combination of predicate abstraction [9,5] and 
decision procedure can help in this problem. Any predicate can be mapped into a 
boolean variable, where this boolean variable will get assigned at the last module 
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which defines its related attributes. If with the aid of a decision procedure, we 
can conclude that this boolean variable can be assigned deterministically 0 or 
1, the abstraction on the predicate is complete. Or else we can either leave it 
nondeterministically assigned, or continue to generate new predicates that leads 
to a deterministic assignment and spread out more predicates. It is proved that 
for the nondeterministic abstraction [16] , VCTL* properties hold in all execution 
paths, will be preserved, which means that CTL* properties verified to be correct 
in abstracted version will be correct in the original version; for the deterministic 
abstraction, in addition to CTL* preservation, properties proved to be incorrect 
in the abstracted system will not hold in concrete system either. 

We tried nondeterministic abstraction in MIHU, which generated 120 new 
boolean predicates. With this abstraction, all correct properties are verified in 
10 minutes. However for the properties that were not verified, the abstraction 
has to be refined to eliminate false negative results in error trace. This process 
should be repeated until a fully deterministic abstraction is generated[16]. To 
generate the fully deterministic abstraction, we estimate that there will be over 
500 new predicates for the MIHU. 

One bottleneck in the deterministic abstraction method is that: it is required 
to enumerate the boolean combination over all boolean variables so that 

Initiallmage ^ ^(/\ 

where /(61, ..., hn) is a boolean expression on boolean variables bl,...,bn (which 
represent abstracted predicates), and R() is the inverse of abstraction. The goal 
is to generate a conjunction which is minimal and satisfies the above constraint. 
The best known complexity of enumeration algorithm to generate such a boolean 
expression is 3^ [16], where k is the number of all predicates. If k is quite large, 
the overhead of abstraction would be expensive. 

7 Conclusions 

In this paper we presented applications of model checking to verification of Vor- 
tex workflow programs. We develop techniques for mapping Vortex programs to 
HDDs efficiently based on the control structure and the declarative semantics of 
the language. Using these techniques, we were able to verify the Vortex program 
used for online customer support. We showed that this program did not satisfy 
all of the stated properties due to missing assumptions on some of the attributes. 

We plan to investigate using other symbolic representations such as arith- 
metic constraints [3] or composite representations [2] for verification of Vortex 
programs. We will also do more experiments on abstraction techniques, and 
develop a customized model checker for Vortex language. 
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Abstract. The growing popularity of multi-threading has led to a great 
number of software libraries that support access by multiple threads. We 
present Local/Glohal Finite State Maehines (LGFSMs) as a model for a 
certain class of multithreaded libraries. We have developed a tool called 
Beacon that does parameterized model checking of LGFSMs. We demon- 
strate the expressiveness of LGFSMs as models, and the effectiveness of 
Beacon as a model checking tool by (1) modeling a multithreaded mem- 
ory manager Rockall developed at Microsoft Research as an LGFSM, 
and (2) using Beacon to check a critical safety property of Rockall. 



1 Introduction 

Software libraries traditionally have been designed for use by single- 
threaded clients. Due to the increasing use of multi-threading both on 
servers and clients, most libraries designed today accommodate simul- 
taneous access by a multitude of threads. A software library typically 
provides its interface through a set of functions that a thread can call. 
In the context of the object-oriented paradigm, the library might simply 
export a set of classes and make its services accessible via the public meth- 
ods of these classes. Furthermore, the library usually maintains internal 
state (using member variables of classes) which can be modified during 
the execution of invoked methods. Even though multiple threads can ac- 
cess a library simultaneously, the library provides a consistent sequential 
semantics to all threads. 

We are interested in checking properties of multithreaded software 
libraries. In particular, we are interested in checking that a library is well- 
behaved with respect to sequences of calls made upon it by a multitude 
of client threads. For object-oriented libraries this boils down to checking 
if the internal state of the library is always correct irrespective of the 
number of threads calling it and the interleaving of the executions of the 
calls made by these threads. More precisely, we want to ensure that the 
library is thread-safe. 

* Microsoft Research, tball@microsoft.com 
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We recently proposed boolean programs [BROOb, BROOa] as a model 
for representing abstractions of imperative programs written in languages 
such as C. Boolean programs are imperative programs in which all vari- 
ables have boolean type. Boolean programs contain procedures with call- 
by-value parameter passing and recursion. Questions such as invariant 
checking and termination (which are undecidable in general) are decid- 
able for boolean programs. 

In order to model multi-threaded programs, we have extended the 
boolean program model with threads. Threads in a multi-threaded 
boolean program execute asynchronously, and communicate with each 
other using shared global variables. If B\ and B2 are two threads of a 
boolean program, we denote their asynchronous composition by B 1 WB 2 . 
Unfortunately, even for boolean programs with only two threads, invariant 
checking is undecidable (this can be proved along the lines of [Ram99]). 

Nonetheless, in practice we believe that the interaction between 
threads (in boolean programs as well as programs in general) usually 
can be modeled by a finite state machine. Therefore, we further abstract 
each thread of a boolean program to a LGFSM (local/global finite state 
machine), which makes the distinction between local and (shared) global 
states explicit. The relationship between a boolean program B and its 
LGFSM abstraction F is one of refinement: the boolean program refines 
the interaction behavior specified by its LGFSM abstraction. We write 
this a,s B ^ F. 

Suppose B\ and B2 are two threads of a boolean program S, whose 
interactions are described by LGFSMs Fi and F2 respectively. Then the 
following proof rule can be used to check if the composition of B\ and B 2 
satisfies invariant ip: 



( 1 ) ^ Fi 

( 2 ) B2 => F2 

(3) Fi||F 2 h 

(4) B 1 WB 2 \= ^ 

Note that proof obligations (1) and (2) involve checking refinement be- 
tween a boolean program, and an LGFSM ^ and proof obligation (3) in- 
volves checking if a composition of two LGFSMs satisfies an invariant. 
All these questions are decidable. 

Now, suppose that we want to check if a boolean program with an 
arbitrary number of threads satisfies an invariant ip. Let S* denote the 
composition of an arbitrary number of threads of a boolean program B. 
Then the following proof rule can be used to check if S* satisfies invariant 




160 



Thomas Ball, Sagar Chaki, and Sriram K. Rajamani 



Lp: 



(5) B 

(6) 

(7) 

In this paper, we give an algorithm to automatically check proof obli- 
gation(6), which has been implemented in a tool called Beacon. We model 
each thread (F) of a multi-threaded library by a local/ global finite state 
machine^ or LGFSM . An arbitrary number of instances of an LGFSM 
comprise a parameterized library system^ or PLS for short. We consider 
the question of whether or not a particular global state (a particular 
valuation to the global variables) is reachable in a PLS. We show that 
this problem is decidable, even when there are an arbitrary number of 
LG PS Ms. 

The results of this paper are four-fold: 

— We formally define the LGFSM and PLS models, which can be used 
to model a wide class of concurrent software systems, namely those in 
which multiple anonymous clients require the services of a centralized 
library. 

— Given a PLS system with m global states and n local states, we 

show that: (1) a global state is reachable in a PLS comprised of an 
arbitrary number of threads iff it is reachable in a PLS comprised of 
2m^' threads; (2) the global state reachability problem for a PLS can 
be decided deterministically in space Q(^ 2 ‘^'f^^^ 9 {n)+ 2 ioglog{m)^ time 
^^^ 2 ‘^niog{n)+ 2 iogiog{m) y complcxity rcsults arc based directly on 

the work of Rackoff [Rac78]. 

— We present an LGFSM model of an industrial-strength multi- 
threaded memory manager called Rockall, developed in Microsoft 
Research. Rockall is written in C-h+. We manually wrote a boolean 
program abstraction of a single thread of Rockall, and (automati- 
cally) inlined the procedure calls to obtain a LGFSM. The LGFSM 
model has m = 2048 global states and n = 256 local states. In the 
LGFSM for Rockall, the global states represent the internal data 
structures of the memory manager while the local states represent 
the states of the clients of the memory manager. 

— We present an algorithm for checking the reachability of a global state 
in a PLS that is similar to the algorithm for computing the minimal 
coverability graph for Petri nets presented in [Fin93]. The algorithm 
has been implemented in a tool called Beacon. When applied to the 
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Rockall model, Beacon was able to prove a critical safety property of 
the model in about 4 hours, despite the fact that the algorithm might 
have had to explore a system with 2 x 2048^^^' threads, in the worst 
case. 

The paper is organized as follows. Section 2 defines the LGFSM and 
PLS models, defines the global state reachability problem and shows that 
it is decidable. Section 3 introduces the Rockall memory manager and 
describes our LGFSM model of Rockall. Section 4 gives our algorithm 
for determining the reachability of a global state in a PLS ^ proves that 
the algorithm terminates and is sound and complete, and describes our 
experiences applying Beacon to Rockall. Section 5 discusses related work 
and Section 6 concludes the paper. 

2 Modeling Multi-threaded Libraries 

This section formally defines the concepts of the local/global finite-state 
machine {LGFSM) model and a parameterized library system (PLS)^ 
presents the reachability problem for a PLS ^ and shows that this problem 
is decidable. Finally it highlights some relationships between PLS and 
Petri nets. 

2.1 Model 

An LGFSM P is a 4-tuple {Ap^Pp^&p^ Tp), where 

— Ap is a finite set of local states. 

— Pp is a finite set of global states. 

— dpGApxPpis the initial state. 

— TpCApxPpxApxPpisa transition relation that prescribes how 
a pair of a local and global states transitions to another pair of local 
and global states. 

Given an LGFSM P, and / > 1, the parameterized library system 
Pf consists of an interleaving composition of / instances of P, where all 
the instances share the same global states. Formally, Pf is a finite state 
machine {Upj,^apj,^ where 

— Spj, are (/-hl)-tuples in Ap^ x Pp. For a state a = (/i, • • • , 

Z’p^, we define projection operators cr(i), for 1 < i < / -h 1 to extract 
the components of a. 

— Gp^ is (Z, Z, . . . , Z, ^), where (/, g) = crp and the |dp^ | = / -h 1. 
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— Tp^ C Up^ X Upj, is a set of transitions, such that 

. . . ,l'f,g')) if for some 1 < i < /, we have 
that r = {{k^g)^ {l[^g')) G Tp, and for all j, where 1 < j < f and 
i 7 ^ j, we have that Ij = . We say that the second state of the tran- 

sition is the image of the first state under the transition. Formally, 
,Vf,g') = /ma^e((/i,/ 2 , . . . ,lf,g),r). 

A sequence a = ao, cri, (72, . . . , cr j over Sp^ is a trajectory of Pj if (1) 
(Jo = and (2) for all 0 < z < j, we have (a^, cr^pi) G Tp^. A state a is 
reachable in Pf if there exists a trajectory that ends in a. A global state 
^ G Pp is reachable in Pf if there exists a reachable state a in Pf such 
that a{f A 1) — g. 

2.2 Decidability of the Reachability Problem 

An instance of the parameterized reachability problem for software li- 
braries consists of an LGFSM P and a global state g ^ Pp. The answer 
to the parameterized reachability problem is “yes” if there exists some 
/ > 1 such that g is reachable in Pf^ and “no” otherwise. 

We exploit two characteristics of LGFSM models. First, in an PLS^ 
each state transition can change the local state component of at most one 
LGFSM . Because of this restriction, it is not possible for an arbitrary 
number of clients to change their local states in a single instant in a 
PLS } Second, because the size of the global state component is bounded 
and the number of clients unbounded, it is not possible for clients to 
communicate their identity to each other through the global state. 

We give an upper bound to the number of threads we need to consider, 
in order to to decide the global state reachability problem for LGFSMs. In 
the sequel, we denote the number of global states in a LGFSM (|Pp|) by 
m and the number of local states (|Ap|) by n. The proofs of the following 
theorems are presented in [BCROO] and omitted here for brevity. 
Theorem 1. Let P be an LGFSM with m global states and n local 
states. Let g G Pp. For all / > 1, global state g is reachable in Pf iff g is 
reachable by a trajectory of length at most in Pf. 

Corollary. Let P be an LGFSM with m global states and n local states. 
A global state g is reachable in Pf for some / > 1 iff ^ is reachable in 
P^(n+l)! • 

Theorem 2. An instance of the parameterized reachability problem with 
a LGFSM that has m global states and n local states can be decided de- 
terministically in space 0{{{n + 2)\log{m))‘^) and time 

^ This is consistent with the interleaving semantics usually given to threads. 
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2.3 Relationship Between PLS and Petri Nets 

The relationship between PLS and Petri net (PN) models of computation 
is underscored by the following two claims. 

Claim 1. Given an LGFSM P, we can construct a Petri net PN^ and a 
mappping 7 such that for all / G IN, 7 maps every reachable state of Pf 
to a reachable state of PN. 

Claim 2. An instance of the parameterized reachability problem for soft- 
ware libraries can be reduced to an instance of the coverability problem 
for Petri nets. 

The justifications for the claims are quite simple and are left as an 
exercise for the reader. The decidability of the coverability problem for 
PNs has been known since [KM69]. Combined with claim 2 , this result 
gives another proof for the decidability of the parameterized reachability 
problem for software libraries. 

3 The Rockall Memory Manager 

In this section, we describe the Rockall memory manager and our 
boolean program and LGFSM models of it. 

3.1 A Quick Tour of Rockall 

Rockall is a configurable thread-safe object-oriented memory manager. 
The basic data structure that Rockall uses for managing memory is the 
“bucket”. Each bucket is responsible for allocating chunks of memory of 
a particular size. Buckets are arranged in a tree-like hierarchy. When a 
bucket runs out of memory, it requests a larger chunk of memory from 
its parent and then breaks up this big chunk into smaller chunks (cor- 
responding to its own size), which it can then allocate as needed. The 
bucket at the root of this hierarchy gets its memory directly from the 
operating system. The number of buckets, their allocation sizes, and the 
tree hierarchy can be configured by the user at startup. 

Rockall has a number of other features that are pertinent to our 
modeling. First, unlike most memory managers, Rockall maintains all 
information regarding the allocated memory chunks (two bits per chunk) 
separately in its own data structure (a hash table) rather than padding 
the memory chunk given to the user process with these bits. This pre- 
vents the user process from accidentally (or intentionally) trampling on 
the manager’s data. This information is required for Rockall to deter- 
mine which bucket a memory chunk was allocated from when memory is 
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deallocated. Several locks are used in Rockall to ensure that each thread 
sees a consistent view of memory and also to achieve high performance. 

The critical safety property of Rockall that we want to ascertain is 
the following : no memory location should be allocated or deallocated 
by Rockall twice or more in succession. In other words, allocation and 
deallocation of every memory location should occur alternately. Since 
each chunk of memory is treated independently by Rockall, the actual 
addresses of the memory chunks are not important for the verification of 
this property. So we do away with the address values completely. Thus, the 
models abstract the behavior of Rockall w.r.t. a single chunk of memory. 
Also, we consider a scenario where Rockall has only two buckets, SO and 
SI, where SI is SO’s parent. Even with these restrictions, the abstract 
models for Rockall are of non-trivial complexity. 

A point to be noted here is that the models are conservative abstrac- 
tions of Rockall w.r.t. sequences of allocation/deallocation of a memory 
location. In other words, for every sequence of allocation/deallocation of 
a memory location done by Rockall, there exists an identical sequence 
of allocation/deallocation done by each of the models. In particular this 
applies also to sequences which violate the desired safety property. Thus 
the fact that either of the models does not violate the safety property 
implies that Rockall does not violate it. 

3.2 Boolean Program Model 

We first describe an abstract Boolean Program model for Rockall. There 
are nine global boolean variables in this model: 

— BO-lock : locks bucket BO, protects variable BO-allocated ; must be 
acquired before BO can allocate or deallocate a chunk ; initially free 
(the variable has the value false). 

— BlJock : locks bucket Bl, protects variables Bl.allocated and 
Bl.subdivided ; must be acquired before bucket Bl can allocate or 
deallocate a chunk ; initially free. 

— newpageJock : must be acquired before the ownership of a chunk is 
transferred from one bucket to another ; protects variables available 
and find ; initially free. 

— findJock : must be acquired before the hash table is searched to find 
the bucket that owns a chunk and before the ownership of a chunk 
is transferred from one bucket to another, as the hash table will be 
updated as a result {newpageJock comes before findJock in the lock 
order) . 
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— BO-allocated : true if bucket BO has allocated its chunk to the user 
process, otherwise false; initially false. 

— BBallocated : true if bucket B1 has allocated its chunk to bucket BO 
or to the user process, otherwise false; initially false. 

— B1 .subdivided : true if bucket B1 has allocated its chunk to bucket 
50, otherwise false; initially false. 

— available : true if bucket 51 has the right to allocate the chunk, false 
if bucket 50 has the right to allocate it ; initially true. 

— find : true if bucket 51 holds the chunk, false if bucket 50 holds it; 
models the hash table ; initially true. 

The boolean program abstraction of Rockall contains seven procedures, 
whose behavior we summarize below: 

— BO -New O', models the allocation of a chunk to the user by bucket 50 ; 
returns true if a successful allocation occurs and false otherwise ; calls 
the procedure FetehFromBl() in the case that 50 has no available 
memory and needs to get memory from 51 before completing the 
allocation request. 

— FetehFromBl() : models the allocation of 51’s chunk to 50 ; returns 
true if a successful allocation occurs and false otherwise. 

— Bl-New() : models the allocation of 51’s chunk to the user ; returns 
true if a successful allocation occurs and false otherwise. 

— B0-Delete() : models the deallocation of 50’s memory chunk ; returns 
true if a successful deallocation occurs and false otherwise, and calls 
the procedure GiveToBl() in case 50 needs to return the chunk to 
51 after the deallocation. 

— GiveToBl() : models the return of the chunk by 50 to 51. 

— Bl-Delete() : models the deallocation of 51’s chunk ; returns true if 
a successful deallocation occurs and false otherwise. 

3.3 Instrumented Program 

Recall that we want to check if no memory location should be allocated or 
deallocated by Rockall twice or more in succession. We add the following 
instrumentation to our Rockall model, in order to reduce the problem of 
checking this safety property to a problem of checking an invariant. 

We add two variables safeO and safel to the boolean program. These 
variables summarize the allocation/deallocation behavior seen so far: 

— if both variables are false then there have been an equal number of 
alternating allocations and deallocations; 
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— if safel is false and safeO is true then there has been an additional 
allocation; 

— if safel is true and safeO is false then there has been an additional 
deallocation; 

— finally, if both variables are true then there have been two or more 
successive allocations or deallocations (this is the error state) 

Part of the instrumentation is a new procedure UpdateState() that 
updates the two shared variables safel and safeO in accordance with the 
allocation/deallocation that has occurred and the above-mentioned pro- 
tocol for updating these two variables. It is called every time a successful 
allocation/deallocation occurs. 

3.4 Translation to LGFSM 

Since the boolean model does not have any recursion, it can easily be 
transformed to a finite state model by inlining all procedure calls. An 
LGFSM abstraction of Rockall was obtained by automatically inlining 
the procedures of the boolean program. Local variables are used to explic- 
itly track important control locations in the boolean program (which are 
implicit in the boolean program representation). The abstract LGFSM 
for Rockall has eleven global variables and eight local variables. Let us 
denote the set of global variables by yp and the set of local variables by 
Xp. We then have m = \Fp\ 2l^^l 2048 and n = \Ap\ 2l'^^l 256. 

4 The Beacon Tool 

The decidability result from Section 2 is of theoretic interest only, as it is 
infeasible to explicitly check all trajectories of length 2m^' even for small 
values of m and n. We have implemented an algorithm which has the effect 
of exploring all such trajectories but employs certain key optimizations 
to reduce the amount of exploration required. In this section, we present 
the algorithm and prove that the optimizations are sound and complete. 
Although the algorithm could, in the worst case, still explore all trajec- 
tories of length at most 2m^', the optimizations seem to be extremely 
effective in practice. 

The Beacon tool was able to verify the desired safety property of 
Rockall for an arbitrary number of threads. It ran on a 800 MHz Pen- 
tium III machine with 512 MB of RAM and took about 240 minutes to 
complete. In the process it explored roughly 2 million states. The com- 
plexity result of section 2 implies that (in the worst case) the algorithm 
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might check all trajectories of length at most 2 x 2048^^^' which is of 
the order of The fact that Beacon managed to verify the prop- 

erty indicates that the optimization techniques we employ might be quite 
effective in practice.^ 

4.1 The Algorithm 

We start by defining an alternate representation for the states of a PLS 
Pf. As before, let m = \Pp\ and let n = \Ap\. A state a of Pf^ for any 
/ > 1, can be represented as (n + l)-tuple 9 G IN^ x Pp. where the global 
states of a and 0 are the same, and for 1 < i < n, the i-th component of 
9 is equal to the number of times li occurs in a. Formally, we have (1) 
9{n + 1) = a{f + 1) , and (2) for 1 < i < n, 9{i) is equal to the number 
of occurrences of Ai in a. The advantage of this alternate representation 
is that it provides a uniform way to represent the states of Pf for all /. 

Representing Infinite Sets of States with Configurations. The 

number of reachable states of Pf for all /, is potentially infinite. We 
use the following trick to represent certain infinite sets of states. We 
allow a special symbol * in our state representation to implicitly rep- 
resent the set of all natural numbers. Formally, a configuration is an 
element of the set {IN U {*}}^ x Pp. Note that every state is a con- 
figuration. A configuration 9 which contains one or more occurrences of 
*, is interpreted to represent the infinite set of states obtained by re- 
placing each occurrence of * by some natural number. For example, if 
n = 4, then the configuration (3, *,0, *,^) represents the set of states 
1(3, i,0, j, 5 r)|i G IN, j G IN}. Note that we cannot use this trick to repre- 
sent any infinite set of states compactly. For example, we cannot represent 
the set of states {(3, 2i,0,5,^)|i G IN} using a configuration. 

We define two unary operators Inc and Dec over the domain IN U {*}. 
If A; G IN then Inc{k) = k + and Dec{k) = k — 1. For fc = *, we 

have Inc{P) = Dec{P) = *. Let = (fci, ^ 2 , . . . ^ki^ . . . ^kj . . . , g) be a 

configuration. Consider z, j such that > 0 and r = ((^, Z^), (^', Ij)) G Tp. 
Then, the image of 9\ under r is defined as 

Image{ei,T) = {ki,k 2 , . . . , Dec{ki), . . . , Inc{kj), . . . ,kn,g') 

^ We had initially attempted to verify the safety property for a fixed number of threads 
of the LGFSM using SMV [McM]. We wrote descriptions of the composition of a 
fixed number of threads of the LGFSM in the SMV language and tried to model 
check the safety property using Gadence’s SMV tool. However the tool was unable 
to verify the property for more than 4 threads when run on the above mentioned 
machine. 
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We note that the image operator is distributive with respect to the 
states in a configuration. That is, Image{9i^r) exactly represents the set 
{ct 2 I 3(71 ^ 0 i.G 2 = Image{ai^r)}. 

We extend the comparison operators < and < to operate over the nat- 
ural numbers extended with *. Let and be the usual comparison 
operators in IN. Let j be in IN U {*}. We say that i < j if (1) j = *, or 

(2) i, j G IN and i j. We say that i < j if (1) j = * and i G IN, or (2) 

i^j G IN and i j. 

Given two configurations i?i, and i?2 , we say that i?2 covers f?i 
, written < f?2 if (1) ^i(n + 1) = i72(^ + 1), and (2) for every 
1 < i < n, we have that < 132(0- We say that f?2 dominates l7i, 

written l7i < 172? if (1) l7i < 172, and (2) for some 1 < i < n, we have that 
l7i(i) < 172 (i). Note that if l7i < 172, then all the global states reachable 
from l7i are also reachable from 172. 

Let l7i and 172 be two configurations such that l7i < 172- Then, we de- 
fine Closure[Qi^ Q 2 ) to be the configuration 173 obtained in the following 
way: 

— 173(77 -h 1) = Q\{n -hi) = 172(77 -h 1), and 

— for every 1 < i < 77, if Qi{i) — 172(0? fhen 173(7) = l7i(i), otherwise 

173(0 = *• 



The Algorithm and Its Properties. Figure 1 presents our algorithm 
for the parameterized reachability problem. The algorithm constructs a 
reachability graph {Reachy ^ Reache) ^ where Reachy is a set of vertices, 
and Reache is a set of directed edges. Each vertex in Reachy is a config- 
uration (we use the terms “vertex” , and “configuration” interchangeably 
in the ensuing description). We maintain a worklist of unexplored con- 
figurations. The worklist is initialized with the initial configuration. The 
algorithm proceeds by picking a configuration c from the worklist and in- 
vestigating every transition r enabled in c (which leads to a configuration 
d). li d is covered by an existing reachable configuration a then no new 
global states can be reached from d that could not be reached from a, so 
d is “dropped”. Instead, if d dominates a configuration a from which d is 
reachable then a compression step is possible (lines [5-8]). Otherwise, d is 
added to the set of reachable configurations and is added to the worklist. 
Three properties remain to be proved about this algorithm: 

— Completeness: Every reachable state in Pf for all / is contained in 
some configuration reached by the algorithm. 
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WorkList:— where let &p — {h,g) in 
6»(n + l) = g, 

0{i) = *, and 

9{j) = 0 for 1 < j < n, j / i 
Reackv’.— WorkList 
Reache’.^ {} 

while {Nonempty {WorkList)) do 
c := Remove { WorkList) 
foreach transition r enabled in c 

[1] d := Image{c,T) 

[2] if there exists a vertex a G Reaehy such that d < a then 

[3] drop d and do nothing 

[4] elsif there exists a vertex a G Reaehy such that a < d and 

there is a path from a to d through edges in Reaehe then 

[5] e := Closure{a,d) 

let V be the set of vertices reachable so far from a (excluding a) in 
delete vertices from V from WorkList and Reaehy 

[6] delete edges connecting to/from vertices in V from Reaehe 

[7] replace a with e in Reaehy and Reaehe 

[8] add e to WorkList 
else 

[9] Reaehy Reaehy U {d} 

[10] Reaehe'.— Reaehe U (c, d) 

[11] add d to WorkList 

if; 

endfor 

endwhile 



Fig. 1. Algorithm for global state reachability in a PLS. 

— Soundness: Every state contained in configurations reached by the 
algorithm is reachable in Pf for some /. 

— Termination: The algorithm terminates. 

The proofs of these properties are similar to proofs of the minimal 
coverability graph algorithm for Petri Nets presented in [Fin93]. They 
are presented in [BCROO] and omitted here for brevity. 

4.2 Implementation Details 

Below we summarize some key features of the implementation of the 
Beacon tool: 

— Beacon constructs a reachability tree instead of a graph by ensuring 
that the same state is not explored more than once. Maintaining a 
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tree makes it much easier to perform the check in step [4] since there 
can be at most one trajectory between two vertices in a directed tree. 

— The reachability tree is constructed in a depth- first manner. We are 
currently experimenting with a breadth-first implementation. 

— We represent * by the largest unsigned integer. While computing the 
image in step [1] we check for overflows. In our experiments we have 
found that the non-zero local state counts are either * or small inte- 
gers. 

— The representation of * as a finite integer coupled with the overflow 
check automatically puts a bound on the length of any explored tra- 
jectory, and hence on the running time of Beacon. The bound on the 
length of the trajectory is much smaller than what is required by the 
result of section 2 but we have found it to be more than sufficient for 
Rockall. This bound can be increased to an arbitrary level simply by 
using a larger value for *. 

— A configuration could be represented as an array of n unsigned inte- 
gers. However we discovered that most of these counts are actually 
zero in the explored states. To reduce space requirements, we use a 
sparse representation where we only maintain the non-zero local state 
counts along with the corresponding local states. 

5 Related Work 

Petri nets (PNs) [Pet62] were introduced in 1962 by C. A. Petri in his 
doctoral dissertation. A few years later, Karp and Miller [KM69] inde- 
pendently proposed Vector Addition Systems (VASs) for analyzing the 
properties of parallel program schemata. Ultimately it was realized that 
they are mathematically equivalent. An excellent survey of PNs, VASs, 
and various decidability issues relating to them can be found in [EN94]. 
Over the years several other models were proposed for representing infinite 
state systems. Many of them, like timed PNs were extensions to PNs, and 
some, like VASSs, were shown to be mathematically equivalent to VASs. 
There has been a lot of interesting work on decidability of problems like 
reachability and coverability for infinite-state systems [ACJYK96, AJ97]. 
Very recently, there has been a remarkable attempt at trying to unify a 
diverse set of infinite-state systems having similar decidability properties 
under a single framework of well- structured transition systems [FSOO]. 

The coverability problem for VASs has been known to be decidable 
since [KM69]. But the algorithm proposed there is notorious for its com- 
plexity. It involves the construction of a coverability tree^ and might 
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require non-primitive recursive space in the worst case. Lipton [Lip76] 
proved that deciding the coverability problem for VASs requires at least 
exponential space in the size of the VAS. More specifically, Lipton showed 
that for some constant d > 0, the problem cannot be decided in space 
2 dy/n^ His lower bounds are valid even if one only considers input whose 
vectors have components of value -1, 0, or 1. Nobody has been able to 
propose an algorithm that matches Lipton’s lower bound. Rackoff [Rac78] 
gave a near-optimal algorithm that requires space bounded by an expo- 
nential of nlog{n)^ where n is the size of the VAS. Unfortunately, Rack- 
off’s algorithm is impractical for even VASs of moderate size. Accord- 
ing to [FSOO], all implemented algorithms for the coverability problem 
[Fin90, Fin93] use Karp and Miller’s coverability tree, or the coverability 
graph, or some complex forward-based method. The work most related to 
ours is the construction of the minimal coverability graph for PNs given 
by Finkel [Fin93]. To the best of our knowledge, this approach has not 
been applied to the parameterized verification of multi-threaded software 
libraries, and has not succeeded on a design as large as Rockall. The 
Petri net for the PNCSA communication protocol used in [Fin93], for 
example, has only 31 places and 36 transitions. 

The link between PNs and parameterized networks has also been 
known for a long time. German and Sistla investigated temporal logic 
model checking of parameterized networks [GS92]. Out of the two mod- 
els presented by them, one is comparable to PLS . The algorithm they 
present for this model is based on Rackoff’s algorithm and has double- 
exponential time complexity. There has also been significant research on 
model checking of programs written in languages like Java which sup- 
port multi-threading [CDH+00, HPOO]. These approaches however con- 
centrate on general Java programs and do not consider arbitrary numbers 
of threads. They impose an apriori bound on the number of threads in 
order to do model checking. 

6 Conclusion and Future Work 

In this paper, we have presented a model called LGFSM for representing 
multi-threaded libraries. Using the model, we have been able to extend 
well-known complexity results and algorithms from the domain of PNs 
and VASs to multi-threaded software libraries. We have implemented our 
algorithm in a tool called Beacon and use it to verify critical safety prop- 
erties of an industrial-strength memory manager called Rockall. Below 
we summarize some interesting and challenging research directions: 
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— The current implementation of Beacon could be optimized further. In 
particular, it would be interesting to see if data structures employed in 
similar algorithms for verification of cache coherence protocols [EN96, 
DelOO] can be used in the domain of LGFSMs. 

— As mentioned before, we believe that in most concurrent programs 
the interaction between threads is regular can be captured using fi- 
nite state machines. One of the major challenges in software model 
checking is extracting this finite state behavior (sometimes called a 
synchronization skeleton) from concurrent program descriptions. Of- 
ten the actual program description is too large to be verified, and the 
synchronization skeleton is sufficient to decide the property of interest. 
We are interested in extracting such finite state models automatically 
and efficiently. 

— Another challenging problem is to efficiently check refinement between 
a LGFSM and a boolean program. The motive behind doing this is 
that if we prove a safety property about a LGFSM and then prove 
that the LGFSM is refined by a C program, we could conclude that 
the safety property holds for the C program also. 

— Finally we would also like to develop parameterized verification tech- 
niques for other, slightly more relaxed models. For example we would 
like to model PLS where the threads have a sense of identity of them- 
selves and others, say through a thread identifier. 

Acknowledgement 

We thank Michael Parkes for giving us access to Rockall, and for labori- 
ously explaining its internal details. We also thank Giorgio Delzanno for 
useful comments and suggestions. 



References 



[ACJYK96] 

[AJ97] 

[BCROO] 

[BROOa] 



P. A. Abdulla, K. Cerans, B. Jonsson, and T. Yih-Kuen. General decid- 
ability theorems for infinite-state systems. LICS ’96: 11th IEEE Symp. 
Logic in Computer Science^ pages 313-321, July 1996. 

P. A. Abdulla and B. Jonsson. Ensuring completeness of symbolic ver- 
ificatiom methods for infinite-state systems. Theoretical Computer Sci- 
ence^ 1997. 

Thomas Ball, Sagar Chaki, and Sriram K. Rajamani. Parameterized 
verification of multithreaded software libraries. Technical Report MSR- 
TR-2000-116, Microsoft Research, December 2000. 

T. Ball and S. K. Rajamani. Bebop: A symbolic model checker for 
boolean programs. SPIN 00: SPIN Workshop, Lecture Notes in Com- 
puter Science 1885, pages 113-130. Springer- Verlag, 2000. 




[BROOb] 

[CDH+00] 

[DelOO] 

[EN94] 

[EN96] 

[Ein90] 

[Ein93] 

[ESOO] 

[GS92] 

[HPOO] 

[KM69] 

[Lip76] 

[McM] 

[Pet62] 

[Rac78] 

[Ram99] 



Parameterized Verification of Multithreaded Software Libraries 173 



T. Ball and S. K. Rajamani. Boolean programs: A model and process 
for software analysis. Technical Report MSR-TR-2000-14, Microsoft 
Research, Eebruary 2000. 

James Corbett, Matthew Dwyer, John Ratcliff, Gorina Pasareanu, 
Robby, Shawn Laubach, and Hongjun Zheng. Bandera : Extracting 
finite-state models from Java source code. ICSE 2000 : International 
Conference on Software Engineering^ 2000. 

G. Delzanno. Automatic Verification of Parameterized Cache Coherence 
Protocols. CAV 00: Computer Aided Verification, Lecture Notes in 
Computer Science 1855, pages 53-68. Springer- Verlag, 2000. 

J. Esparza and M. Nielsen. Decibility issues for petri nets - a survey. 
Journal of Informatik Processing and Cybernetics, 30(3): 143-160, 1994. 
E. A. Emerson and K. S. Namjoshi. Automatic Verification of Param- 
eterized Synchronous Systems. CAV 96: Computer Aided Verification, 
Lecture Notes in Computer Science 1102, pages 87-98. Springer- Verlag, 
1996. 

A. Einkel. Reduction and covering of infinite reachability trees. Infor- 
mation and Computation, 89:144-179, 1990. 

A. Einkel. The minimal coverability graph for petri nets. Advances in 
Petri Nets, Lecture Notes in Computer Sceince, 674:210-243, 1993. 

A. Einkel and Ph. Schnoebelen. Well- structured transition systems ev- 
erywhere ! Theoretical Computer Science, 2000. To appear. 

S. M. German and A. P. Sistla. Reasoning about systems with many 
processes. JACM, 39(3), July 1992. 

K. Havelund and T. Pressburger. Model checking Java programs using 
JavaPathEinder. STTT: International Journal on Software Tools for 
Technology Transfer, 2(4), April 2000. 

R. M. Karp and R. E. Miller. Parallel program schemata. Journal of 
Computer and System Sciences, 3:147-195, 1969. 

R. J. Lipton. The reachability problem requires exponential space. Tech- 
nical report. Department of Computer Science, Yale University, 1976. 
K.L. McMillan, http://www-cad.eecs.berkeley.edu/~kenincmil. 

C. Petri. Eundamentals of a theory of asynchronous information flow. 
Information Processing 62, Proceedings of the 1962 IPIP Congress, 
pages 386-390, 1962. 

C. Rackoff. The covering and boundedness problem for vector addition 
systems. Theoretical Computer Science, 6:223-231, 1978. 

G. Ramalingam. Context sensitive synchronization sensitive analysis is 
undecidable. Technical Report RC21493, IBM T.J. Watson Research, 
May 1999. 




Efficient Guiding Towards Cost-Optimality in 

UPPAAL* 



Gerd Behrmann^, Ansgar Fehnker^t^ Thomas Hune^, Kim Larsen^, 
Paul Pettersson® , and Judi Romijn^ 

^ Basic Research in Computer Science, Aalborg University, 
E-mail: behrmann@cs.auc.dk 

^ Basic Research in Computer Science, Aarhus University, 

E-mail: baris@brics.dk 

^ Computing Science Institute, University of Nijmegen, 

E-mail: [ansgar , judi] @cs .kun.nl 
^ Department of Computer Science, University of Twente^, 
E-mail: kgl@cs.auc.dk 

^ Department of Computer Systems, Information Technology, 
Uppsala University, E-mail: paupet@docs.uu.se. 



Abstract. In this paper we present an algorithm for efficiently comput- 
ing the minimum cost of reaching a goal state in the model of Uniformly 
Priced Timed Automata (UPTA). This model can be seen as a submodel 
of the recently suggested model of linearly priced timed automata, which 
extends timed automata with prices on both locations and transitions. 
The presented algorithm is based on a symbolic semantics of UTPA, and 
an efficient representation and operations based on difference bound ma- 
trices. fn anafogy with Dijkstra’s shortest path afgorithm, we show that 
the search order of the afgorithm can be chosen such that the number of 
symbofic states expfored by the afgorithm is optimaf, in the sense that 
the number of expfored states can not be reduced by any other search 
order. We afso present a number of techniques inspired by branch- and- 
bound afgorithms which can be used for fimiting the search space and 
for quickfy Ending near- optimaf sofutions. 

The afgorithm has been impfemented in the verification toof UPPAAL. 
When appfied on a number of experiments the presented techniques re- 
duced the expfored state- space with up to 90%. 



1 Introduction 

Recently, formal verification tools for real-time and hybrid systems, snch as Up- 
PAAL [LPY97], Kronos [BDM'^98] and HyTech [HHWT97], have been applied 
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to solve realistic scheduling problems [Feh99b,HLP00,NY99]. The basic com- 
mon idea of these works is to reformulate a scheduling problem to a reachability 
problem that can be solved by verification tools. In this approach, the automata 
based modeling languages of the verification tools serve as the input language in 
which the scheduling problem is described. These modeling languages have been 
found to be very well-suited in this respect, as they allow for easy and flexible 
modeling of systems consisting of several parallel components that interact in a 
time-critical manner and constrain the behavior of each other in a multitude of 
ways. 

A main difference between verification algorithms and dedicated scheduling 
algorithms is in the way they search a state-space to find solutions. Scheduling 
algorithms are often designed to find optimal (or near optimal) solutions and 
are therefore based on techniques such as branch-and-bound to identify and 
prune parts of the states-space that are guaranteed to not contain any optimal 
solutions. In contrast, verification algorithms do normally not support any notion 
of optimality and are designed to explore the entire state-space as efficiently as 
possible. The verification algorithms that do support notions of optimality are 
restricted to simple trace properties such as shortest trace [LPY95], or shortest 
accumulated delay in trace [NTYOO]. 

In this paper we aim at reducing the gap between scheduling and verification 
algorithms by adopting a number of techniques used in scheduling algorithms 
in the verification tool IJPPAAL. In doing so, we study the problem of efficiently 
computing the minimal cost of reaching a goal state in the model of Uniformly 
Priced Timed Automata (UPTA). This model can be seen as a restricted version 
of the recently suggested model of Linearly Priced Timed Automata (LPTA) 
[BFH+01], which extends the model of timed automata with prices on all tran- 
sitions and locations. In these models, the cost of taking an action transition is 
the price associated with the transition, and the cost of delaying d time units in 
a location is d • p, where p is the price associated with the location. The cost of a 
trace is simply the accumulated sum of costs of its delay and action transitions. 
The objective is to determine the minimum cost of traces ending in a goal state. 

The infinite state-spaces of timed automata models necessitates the use of 
symbolic techniques in order to simultaneously handle sets of states (so-called 
symbolic states). For pure reachability analysis, tools like IJpPAAL and Kro- 
NOS use symbolic states of the form (/,^), where I is a location of the timed 
automaton and Z C is a convex set of clock valuations called a zone. For 
the computation of minimum costs of reaching goal states, we suggest the use of 
symbolic cost states of the form (/, C), where C : (M>o U {oo}) is a cost 

function mapping clock valuations to real valued costs or oo. The intention is 
that, whenever C{u) < oo, reachability of the symbolic cost state {l,C) should 
ensure that the state (/, u) is reachable with cost C{u). 

Using the above notion of symbolic cost states, an abstract algorithm for 
computing the minimum cost of reaching a goal state satisfying (f of a uniformly 

^ C denotes the set of clocks of the timed automata, and denotes the set of functions 
from C to M>o. 
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Cost := oo 
Passed := 0 
Waiting := {(/o,Go)} 
while Waiting / 0 do 

select (/, C) from WAITING 
if (/, C) 1= (p and min{C) < Cost then 
Cost := min{C) 

if for all (/, C^) in Passed: \^C then 

add (I, C) to Passed 

for all (m, D) such that (/, C) ^ (m, D) \ add (m, D) to WAITING 
return Cost 

Fig. 1. Abstract Algorithm for the Minimal-Cost Reachability Problem. 



priced timed automaton is shown in Fig. 1. The algorithm is similar to a stan- 
dard state-space traversal algorithm that uses two data-structures WAITING and 
Passed to store states waiting to be examined, and states already explored, re- 
spectively. Initially, Passed is empty and WAITING holds an initial (symbolic 
cost) state. In each iteration, the algorithm proceeds by selecting a state (/,C) 
from Waiting, checking that none of the previously explored states {l^C^) has 
a “smaller” cost function, written and if this is the case, adds it to 

Passed and its successors to Waiting. In addition the algorithm uses the global 
variable Cost, which is initially set to oo and updated whenever a goal state is 
found that can be reached with a lower cost than the current value of Cost. The 
algorithm terminates when WAITING is empty, i.e. when no further states are 
left to be examined. Thus, the algorithm always searches the entire state-space 
of the analyzed automaton. 

In [BFH+01] an algorithm for computing the minimal cost of reaching desig- 
nated goal states was given for the full model of LPTA. However, the algorithm 
is based on a cost-extended version of regions, and is thus guaranteed to be 
extremely inefficient and highly sensitive to the size of constants used in the 
models. As the first contribution of this paper, we give for the subclass of UPTA 
an efficient zone representation of symbolic cost states based on Difference Bound 
Matrices [Dil89], and give all the necessary symbolic operators needed to imple- 
ment the algorithm. As the second contribution we show that, in analogy with 
Dijkstra’s shortest path algorithm, if the algorithm is modified to always select 
from Waiting the (symbolic cost) state with the smallest minimum cost, the 
state-space exploration may terminate as soon as a goal state is explored. This 
means that we can solve the minimal-cost reachability problem without neces- 
sarily searching the entire state-space of the analyzed automaton. In fact, it can 
even be shown that the resulting algorithm is optimal in the sense that choos- 
ing to search a symbolic cost state with non-minimal minimum cost can never 
reduce the number of symbolic cost states explored. 

The third contribution of this paper is a number of techniques inspired by 
branch-and-bound algorithms [AC91] that have been adopted in making the 



2 Formally □ C iff Va. C^{u) < C(a). 
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algorithm even more useful. These techniques are particularly useful for limiting 
the search space and for quickly finding solutions near to the minimum cost of 
reaching a goal state. To support this claim, we have implemented the algorithm 
in an experimental version of the verification tool IJPPAAL and applied it to 
a wide variety of examples. Our experimental findings indicate that in some 
cases as much as 90% of the state-space searched in ordinary breadth-first order 
can be avoided by combining the techniques presented in this paper. Moreover, 
the techniques have allowed pure reachability analysis to be performed in cases 
which were previously unsuccessful. 

The rest of this paper is organized as follows: In Section 2 we formally define 
the model of uniformly priced timed automata and give the symbolic semantics. 
In Section 3 we present the basic algorithm and the branch-and-bound inspired 
techniques. The experiments are presented in Section 4. We conclude the paper 
in Section 5. 

2 Uniformly Priced Timed Automata 

In this section linearly priced timed automata are formalized and their seman- 
tics are defined. The definitions given here resemble those of [BFH+01], except 
that the symbolic semantics uses cost functions whereas [BFH+01] uses priced 
regions. Zone-based data-structures for compact representation and efficient ma- 
nipulation of cost functions are provided for the class of uniformly priced timed 
automata. It is simple to extend linearly priced timed automata to networks of 
linearly priced timed automata, but for brevity parallel composition is omitted 
here. 



2.1 Linearly Priced Timed Automata 

Formally, linearly priced timed automata (LPTA) are timed automata with 
prices on locations and transitions. We also denote prices on locations as rates. 
Let C be a set of clocks. Then >5(C) is the set of formulas that are conjunc- 
tions of atomic constraints of the form x ixi n and x — y n for x^y G C, 
ixiG {<,<,=,>,>} and n being a natural number. Elements of B{C) are called 
clock constrains over C. V{C) denotes the power set of C. 

Definition 1 (Linearly Priced Timed Automata). A linearly priced timed 
automaton A over clocks C and actions Act is a tuple {L,Iq, E, /, P) where L is 
a finite set of locations, Iq is the initial location, E C L x B{C) x Act x 'P(C) x L 
IS the set of edges, where an edge contains a source, a guard, an action, a set 
of clocks to be reset, and a target, I : L ^ B{C) assigns invariants to locations, 
and P : {L U E) ^ m assign prices to both locations and edges. In the case of 
{I, g, a,r,P) ^ E, we write I /F 

Clock values are represented as functions called clock valuations from C to 
the non-negative reals M>o- We denote by the set of clock valuations for C. 
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Fig. 2. An example of an LPTA with two clocks, x and y. The nnmber in the 
states is the rate of the state and the nnmber on the transitions is the cost of 
taking the transition. A minimal trace to the rightmost state needs to visit the 
initial state twice, and has cost 14. 



Definition 2 (Semantics). The semantics of a linearly priced timed automa- 
ton A IS defined as a labeled transition system with the state-space L x with 
initial state {lo,uo) (where uq assigns zero to all clocks in C) and with the fol- 
lowing transition relation: 

— (/, n) (/, u d) //VO <e<d:u-\-eE /(/), and p — d • P{1)^ 

— (/, n) (/^, T) if there exists g^ r sT A // u ^ g, u' = u[r 0], and 

p = P{{l,g,a,r, 1')), 

where for d G M>0; u d maps each clock x in C to the value u[x) + d, and 
i/[r i-G 0] denotes the clock valuation which maps each clock in r to the value 0 
and agrees with u over C \ r. 

The transitions are decorated with a delay-qnantity or an action, together with 
the cost of the transition. The cost of an execntion trace is simply the accnmn- 
lated cost of all transitions in the trace, see Fig. 2. 

Definitions (Cost). Let a = (/o,'^o) (In^Un) ke a 

finite execution trace. The cost of cost{a), is the sum T(_^pi, For a given 
state (/,n) the minimum cost mincost{l,u) of reaching the state, is the infimum 
of the costs of finite traces ending in (fiu). For a given location I the minimum 
cost mincost{l) of reaching the location, is the infimum of the costs of finite 
traces ending in (/, u) for some u. 



2.2 Cost Functions 

The semantics of LPTA yields an nnconntable state-space and is therefore not 
snited for state-space exploration algorithms. To overcome this problem, the al- 
gorithm in Fig. 1 nses symbolic cost states, qnite similar to how timed antomata 
model checkers like IJPPAAL nse symbolic states. 

Typically, symbolic states are pairs on the form {l^Z), where Z C is a 
convex set of clock valnations, called a zone, representable by Difference Bound 
Matrices (DBMs) [Dil89]. The operations needed for forward state-space ex- 
ploration can be efficiently implemented nsing the DBM data-strnctnre. In the 
priced setting we mnst in addition represent the costs with which individnal 
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Table 1. Common operations on cost functions. 
Operation Cost Function ^ IP>o) 

Delay delay(C, p) : u i-^ inf{C(a) + p • d | d G IP>o A a + d = u} 

Reset ^(C) : u i-G inf{C(a) | u = r(a)} 

Satisfaction d(C) : u i— )■ min{C(a) \ v \= g A u = v} 

Increment C k : u \-A C(u) + A;, A; G N 

Comparison DEC Vu : D(u) < C(u) 

Infimum min{C) = inf{C(u) | u G 



states are reached. For this we suggest the use of symbolic cost states, {l,C), 
where C is a cost function mapping clock valuations to real valued costs. Thus, 
within a symbolic cost state (/, C), the cost of a state (/, u) is given by C{u), 

Definition 4 (Cost Function). A cost function C : ^ M>oU{oo} assigns 

to each clock valuation, u, a positive real valued cost, c, or infinity. The support 
sup[C) = {u I C{u) < oo} IS the set of valuations mapped to a finite cost. 

Table 1 summarizes several operations that are used by the symbolic semantics 
and the algorithm in Fig. 1. In terms of the support of a cost function, the 
operations behave exactly as on zones; e.g. sup{r{C)) = r{sup{C)). The opera- 
tions effect on the cost value reflect the intent to compute the minimum cost of 
reaching a state, e.g., r(C)(u) is the infimum of C{v) for all v that reset to u. 



2.3 Symbolic Semantics 

The symbolic semantics for LPTA is very similar to the common zone based 
symbolic semantics used for timed automata. 

Definitions (Symbolic Semantics). Let A — {L,Iq,E,I,P) be a linearly 
priced timed automaton. The symbolic semantics is defined as a labelled transi- 
tion system over symbolic cost states on the form {l,C), I being a location and 
C a cost function with the transition relation: 

- {l,C) A {delay 

- {l,C) A {l',I{l){r{g{C))) +p^ and p= P{{l,g,a,r,l'))- 

The initial state is {lo,Co) where sup{Co) = {uq} and Co{uq) = 0. 

Notice that the support of any cost function reachable by the symbolic semantics 
is a zone. 

Lemma 1. Given LPTA A, for each trace a of A that ends in state (l,u), there 
exists a symbolic trace fi of A, that ends up in a symbolic cost state (l,C), such 
that C{u) = cost{a). 
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Lemma 2. Whenever {l,C) is a reachable symbolic state and u G sup{C)^ then 
mincost{l, u) < C{u) for all u. 

Theorem 1. mincost[l) = min{min[C) \ {l,C) is reachable} 

Theorem 1 ensures that the algorithm in Fig. 1 indeed does find the minimum 
cost, but since the state-space is still infinite there is no guarantee that the algo- 
rithm ever terminates. For zone based timed automata model checkers, termina- 
tion is ensured by normalizing all zones with respect to a maximum constant M 
[Rok93], but for LPTA ensuring termination also depends on the representation 
of cost functions. 

2.4 Representing Cost Functions 

As stated in the introduction, we provide an efficient implementation of cost 
functions for the class of Uniformly Priced Timed Automata (UPTA). 

Definition 6 (Uniformly Priced Timed Automata). An uniformly priced 
timed automaton is an LPTA where all locations have the same rate. We refer 
to this rate as the rate of the UPTA. 

Lemma 3. Any UPTA A with positive rate can be translated into an UPTA B 
with rate 1 such that mincost[l) in A is identical to mincost{l) in B. 

Thus, in order to find the infimum cost of reaching a satisfying state in UPTA, 
we only need to be able to handle rate zero and rate one. 

In case of rate zero, all symbolic states reachable by the symbolic semantics 
have very simple cost functions: The support is mapped to the same integer 
(because the cost is 0 in the initial state and only modified by the increment 
operation). This means that a cost function C can be represented as a pair (Z, c), 
where Z is a zone and c an integer, s.t. C{u) = c when u ^ Z and oo otherwise. 
Delay, reset and satisfaction are easily implementable for zones using DBMs. 
Increment is a matter of incrementing c and a comparison (Zi^ci) \Z (^ 2 ,^ 2 ) 
reduces to Z 2 U Z\ (\c\ < C 2 . Termination is ensured by normalizing all zones 
with respect to a maximum constant M. 

In case of rate one, the idea is to use zones over C U {(J}, where S is an addi- 
tional clock keeping track of the cost, s.t. every clock valuation u is associated 
with exactly one cost Z{u) in zone Then, C{u) = c iff u[S c] ^ Z . This 
is possible because the continuous cost advances at the same rate as time. De- 
lay, reset, satisfaction and infimum are supported directly by DBMs. Increment 
C c translates to Z[S S k] = {i/[^ 1 -^ k] | 1 / G Z} and is also re- 

alizable using DBMs. For comparison between symbolic cost states, notice that 
Z 2 C Zi ^ Zi U Z 2 , whereas the implication in the other direction does not 
hold in general, see Fig. 3. However, it follows from the following Lemma 4 that 
comparisons can still be reduced to set inclusion provided the zone is extended 
in the S dimension, see Fig. 3. 

^ We dehne Z(u) to be 00 if a is not in Z. 
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Fig. 3. Let a? be a clock and let S be the cost. In the fignre, Z \L Z\ \L Z 2 ^ bnt 
only is a snbset of The ()^ operation removes the npper bonnd on 8^ hence 
z\(iz^ ^ Z\1 Z’2. 

Lemma 4. Let Z"^ = {w[<i u(S) + c/] j u E Z A d E M>o}. Then Zi Q Z 2 

zlczl 

It is straightforward to implement the () "^-operation on DBMs. However, a nsefnl 
property of the () "^-operation is, that its effect on zones can be obtained withont 
implementing the operation. Let {Io,Zq), where Zq is the zone encoding Go, 
be the initial symbolic state. Then Z — Z"^ for any reachable state {l,Z) — 
intnitively becanse S is never reset and no gnards or invariants depend on S. 

Termination is ensnred if all clocks except for S are normalized with respect 
to a maximnm constant M. It is important that normalization never tonches 
S. With this modification, the algorithm in Fig. 1 will essentially enconnter 
the same states as the traditional forward state-space exploration algorithm for 
timed antomata, except for the addition of S. 

3 Improving the State-Space Exploration 

As mentioned, the major drawback of the algorithm in Fig. 1 is that it reqnires 
the entire state-space to be searched before the minimum cost of reaching a goal 
state can be declared. In this section we will discuss a number of possibilities for 
improving this in some cases. 

3.1 Minimum Cost Order 

In realizing the algorithm of Fig. 1, and in analogy with DijkstraA algorithm for 
finding the shortest path in a directed weighted graph, we may choose always to 
select a (symbolic cost) state (/, C) from WAITING for which C has the smallest 
minimum cost. With this choice, we may terminate the algorithm as soon as a 
goal state is selected from WAITING. We will refer the search order arising from 
this strategy as the Minimum Cost order (MC order). 

Lemma 5. Using the MC order, an optimal solution is found by the algorithm 
in Fig. 1 when a goal state is selected from WAITING the first time. 




182 



Gerd Behrmann et al. 



When applying the MC order, the algorithm in Fig. 1 can be simplified since the 
variable Cost is not needed any more. Again in analogy with Dijkstra’s shortest 
path algorithm, the MC ordering finds the minimnmcost of reaching a goal state 
with gnarantee of its optimality, in a manner which reqnires exploration of a 
minimum number of symbolic cost states. 

Lemma 6. Using the algorithm in Fig. it can never reduce the number of 
explored states to prefer exploration of a symbolic cost state of WAITING with 
non-minimal minimum cost. 

In sitnations when WAITING contains more than jnst one symbolic cost state with 
smallest minimnmcost, the MC order does not offer any indication as to which 
one to explore first. In fact, for exploration of the symbolic state-space for timed 
antomata withont cost, we do not know of a definite strategy for choosing a state 
from Waiting such that the fewest number of symbolic states are generated. 
However, any improvements gained with respect to the search-order strategy for 
the state-space exploration of timed automata will be directly applicable in our 
setting with respect to the strategy for choosing between symbolic cost states 
with same minimum cost. 

3.2 Using Estimates of the Remaining Cost 

From a given state one often has an idea about the cost remaining in order to 
reach a goal state. In branch-and-bound algorithms this information is used both 
to delete states and to search the most promising states first. Using information 
about the remaining cost can also decrease the number of states searched before 
an optimal solution is reached. 

For a state (/, u) let rem((/, i/)) be the minimnmcost of reaching a goal state 
from that state. In general we cannot expect to know exactly what the remaining 
cost of a state is. We can instead use an estimate of the remaining cost as long 
as the estimate does not exceed the actual cost. For a symbolic cost state (/, C) 
we require that Rem(/,C) satisfies Rem(/,C) < inf{rem((/, i/)) | u G sup{C)}^ 
i.e. Rem(/, C) offers a lower bound on the remaining cost of all the states with 
location I and clock valuation within the support of C. 

Combining the minimum cost min{C) of a symbolic cost state (/, C) with the 
estimate of the remaining cost Rem(/, C), we can base the MC order on the sum 
of min{C) and Rem(/, C). Since min(C) -h Rem(/, C) is smaller than the actual 
cost of reaching a goal state, the first goal state to be explored is guaranteed to 
have optimal cost. We call this the MC-h order but it is also known as Least- 
Lower-Bound order. In Section 4 we will show that even simple estimates of the 
remaining cost can lead to large improvements in the number of states searched 
to find the minimum cost of reaching a goal state. 

One way to obtain a lower bound is for the user to specify an initial estimate 
and annotate each transition with updates of the estimate. In this case it is the 
responsibility of the user to guarantee that the estimate is actually a lower bound 
in order to ensure that the optimal solution is not deleted. This also allows the 
user to apply her understanding and intuition about the system. 
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3.3 Heuristics and Bounding 

It is often useful to quickly obtain an upper bound on the cost instead of waiting 
for the minimum cost . In particular, this is the case when faced with a state-space 
too big for the MC order to handle. As will be shown in Section 4, the techniques 
described here for altering the search order using heuristics are very useful. In 
addition, techniques from branch- and-bound algorithms are useful for improving 
the upper bound once it has been found. Applying knowledge about the goal 
state has proven useful in improving the state-space exploration [RE99,HLP00], 
either by changing the search order from the standard depth or breadth-first, or 
by leaving out parts of the state-space. 

To implement the MC order, a suitable data-structure for WAITING would 
be a priority queue where the priority is the minimum cost of a symbolic cost 
state. We can obviously generalize this by extending a symbolic cost state with a 
new field, priority, which is the priority of the state used by the priority queue. 
Allowing various ways of assigning values to priority combined with choosing 
either to first select a state with large or small priority opens for a large variety 
of search orders. 

Annotating the model with assignments to priority on the transitions, is one 
way of allowing the user to guide the search. Because of its flexibility it proves to 
be a very powerful way of guiding the search. The assignment works like a normal 
assignment to integer variables and allows for the same kind of expressions. 

When searching for an error state in a system a random search order might 
be useful. We have chosen to implement what we call random depth- first order 
which as the name suggests is a variant of a depth-first search. The only difference 
between this and a standard depth-first search is that before pushing all the 
successors of a state on to WAITING (which is implemented as a stack), the 
successors are randomly permuted. 

Once a reachable goal state has been found, an upper bound on the minimum 
cost of reaching a goal state has been obtained. If we choose to continue the 
search, a smaller upper bound might be obtained. During state-space exploration 
the cost never decreases therefore states with cost bigger than the best cost found 
in a goal state cannot lead to an optimal solution, and can therefore be deleted. 
The estimate of the remaining cost defined in Section 3.2 can also be used for 
pruning exploration of states since whenever min{C) + Rem(/, C) is larger than 
the best upper bound, no state covered by {fiC) can lead to a better solution 
than the one already found. 

All of the methods described in this section have been implemented in IJp- 
PAAL. Section 4 reports on experiments using these new methods. 

4 Experiments 

In this section we illustrate the benefits of extending IJPPAAL with heuristics and 
costs through several verification and optimization problems. All of the examples 
have previously been studied in the literature. 
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4.1 The Bridge Problem 

The following problem was proposed by Ruys and Brinksma [RB98]. A timed 
antomaton model of this problem is inclnded in the standard distribntion of 
Uppaal^. 

Fonr persons want to cross a bridge in the dark. The bridge is damaged 
and can only carry two persons at the same time. To cross the bridge safely in 
the darkness, a torch mnst be carried along. The gronp has only one torch to 
share. Dne to different physical abilities, the fonr cross the bridge at different 
speeds. The time they need per person is (one-way) 25, 20, 10 and 5 minntes, 
respectively. The problem is to find a schednle snch that all fonr cross the bridge 
within a given time. This can be done with standard IJPPAAL. With the proposed 
extension, one can also find the best possible time for the persons to cross the 
bridge, and a schedule for this. 

We compare four different search orders: Breadth-First (BF), Depth-First 
(DF), Minimum Cost (MC) and the improved Minimum Cost (MC-h) also using 
the estimate of the remaining cost, Rem(C). In this example we choose the 
estimate of the remaining cost to be the time needed by the slowest person, who 
is still on the “wrong” side of the bridge. 

Table 2 shows the number of states explored and the cost found for the first 
and the optimal solution. The third column shows the number of states explored 
and the cost when states are deleted based on the estimate of the remaining cost 
(this does not apply to MC and MC-h because the search stops when the first 
solution is found). As can be seen from the table, only about 10% of the states 
searched to find an initial solution using breadth first order is needed for the 
MC-h order to find the optimal solution. 



Table 2. Bridge problem by Ruys and Brinksma. 





Initial Solution 


Optimal Solution 


With est. 


remainder 




states 


cost 


states 


cost 


states 


cost 


BF 


4491 


65 


4539 


60 


4493 


60 


DF 


169 


685 


25780 


60 


5081 


60 


MC 


1536 


60 


1536 


60 


N/A 


N/A 


MC + 


404 


60 


404 


60 


N/A 


N/A 



4.2 Job Shop Scheduling 

A well known class of scheduling problems are the Job Shop problems. The 
problem is to optimally schedule a set of jobs on a set of machines. Each job 
is a chain of operations, usually one on each machine, and the machines have 
a limited capacity, also limited to one in most cases. The purpose is to allocate 
starting times to the operations, such that the overall duration of the schedule, 
the makespan, is minimal. 

^ The distribution can be obtained at http://www.uppaal.com. 
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We apply IJPPAAL to 25 of the smaller Lawrence Job Shop problems.^ Our 
models are based on the timed automata models in [Feh99a]. In order to esti- 
mate the lower bound on the remaining cost, we calculate for each job and each 
machine the duration of the remaining operations. The final estimate of the re- 
maining cost is then estimated to be the maximum of these durations. Table 3 
shows results obtained for the first 15 problems for the search orders DF, Ran- 
dom DF, and a combined heuristic. The latter is based on depth-first but also 
takes into account the remaining operation times and the lower bound on the 
cost, via a weighted sum which is assigned to the priority field of the symbolic 
state. We also tried using BF and MC order, but we did not obtain any results 
even if we allow MC order to search for more than 30 minutes using more than 
2Gb of memory no solution is found. With the MC-h order we could only find 
solutions to la05 and laid exploring 9791 and 10653 states respectively. It is 
important to notice that the combined heuristic used includes a clever choice 
between states with the same values of cost plus remaining cost. This is the 
reason it is able to outperform the MC-h order. 

As can be seen from the table IJpPAAL is handling the first 15 examples quite 
well. For the 10 largest problems (lal6 to la25) with 10 machines we did not find 
optimal solutions though in some cases we were very close to the optimal solution. 
Since branch-and-bound algorithms generally do not scale too well when the 
number of machines and jobs increase, this is not surprising. The branch-and- 
bound algorithm for [AC91], who solves about 10 out the 15 problems in the same 
setting, faces the same problem. Note that the results of this algorithm depend 
sensitively on the choice of an initial upper bound. Also the algorithm used 
in [BJS95], who combines a good heuristic with an efficient branch-and-bound 
algorithm and thus solves all of these 15 instances, does not find solutions for 
the larger instances with 15 jobs and 10 machines or larger. 



Table 3. Results for the smaller 15 Job Shop problems with 5 machines and 
10 jobs (lal-la5), 15 jobs (Ia6-lal0) and 20 jobs (lall-lal5). The table shows the 
best solution found by different search orders within 60 seconds cputime on a 
Pentium II 300 MHz. If the search terminated also the number of explored states 
is given. The last row gives the makespan of an optimal solution. 



Iproblem instance 


laOl 


la02 


la03 


la04 


la05 


la06 


la07 


la08 


la09 


lalO 


lall 


lal2 


lal3 


lal4 


lal5 


DF 


cost 


2466 


2360 


2094 


2212 


1955 


3656 


3410 


3520 


3984 


3681 


4974 


4557 


4846 


5145 


5264 


states 


- 


- 


- 


- 


- 


- 


- 


- 


- 


- 


- 


- 


- 


- 


- 


RDF 


cost 


842 


806 


769 


783 


696 


1076 


1113 


1009 


1154 


1063 


1303 


1271 


1227 


1377 


1459 


states 


- 


- 


- 


- 


- 


- 


- 


- 


- 


- 


- 


- 


- 


- 


- 


comb. 


cost 


666 


672 


626 


639 


593 


926 


890 


863 


951 


958 


1222 


1039 


1150 


1292 


1289 


heur 


states 


292 


- 


- 


- 


284 


480 


- 


400 


425 


454 


642 


633 


662 


688 


- 


minimal 


makespan 


666 


655 


597 


590 


593 


926 


890 


863 


951 


958 


1222 


1039 


1150 


1292 


1207 



^ These and other benchmark problems for Job Shop scheduling can be found on 
ftp : //ftp . caam.rice . edu/pub/people/applegate/ jobshop/. 
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4.3 The Sidmar Steel Plant 

Proving schedulability of an indnstrial plant via a reachability analysis of a 
timed antomaton model was firstly applied to the SIDMAR steel plant, which 
was inclnded as case stndy of the Esprit-LTR Project 26270 VHS (Verification 
of Hybrid Systems) . The plant consists of five machines placed along two tracks 
and a casting machine where the finished steel leaves the system. The two tracks 
and the casting machine are connected via two overhead cranes on one track. 
Each qnantity of raw iron enters the system in a ladle and depending on the 
desired steel qnality nndergoes treatments in the different machines of different 
dnrations. The aim is to control the plant in particnlar the movement of the ladles 
with steel between the different machines, taking the topology of the plant into 
consideration. 

We nse a model based on the models and descriptions in [BS99,Eeh99b,HLP99]. 
A fnll model of the plant that inclndes all possible behaviors was however not 
immediate suitable for verification. Using BE or DE search it was impossible to 
generate a schedule for a model with only three ladles. Priorities can be used to 
influence the search order of the state space, and thus to improve the results. 
Based on a depth-first strategy, we reward transitions that are likely to serve in 
reaching the goal, whereas transitions that may spoil a partial solution result in 
lower priorities. 

A schedule for three ladles was produced in [Eeh99b] for a slightly simplified 
model using Uppaal. In [HLP99] schedules for up to 60 ladles were produced also 
using Uppaal. However, in order to do this, additional constraints were included 
that reduce the size of the state-space drastically, but also prune possibly sensible 
behavior. A similar reduced model was used by Stobbe in [StoOO], who uses 
constraint programming to schedule 30 ladles. All these works only consider 
ladles with the same quality of steel and the initial solutions cannot be improved. 

Using a search order based on the priorities we can generate a schedule for 
ten ladles, compared to two without priorities, with varying qualities of steel 
within 60 seconds cputime on a Pentium II 300 MHz. The initial solution found 
is improved by 5% within the time limit. Importantly, in this approach we do 
not rule out optimal solutions. Allowing the search to go on for longer, models 
with more ladles can be handled. 

4.4 Pure Heuristics: The Biphase Mark Protocol 

The Biphase Mark protocol is a convention for transmitting strings of bits and 
clock pulses simultaneously as square waves. This protocol is widely used for 
communication in the ISO/OSI physical layer; for example, a version called 
“Manchester encoding” is used in the Ethernet. The protocol ensures that strings 
of bits can be submitted and received correctly, in spite of clock drift, jitter and 
filtering by the channel. A formal parameterized timed automaton model of the 
Biphase Mark Protocol was given in [VaaOO]. We will use the corresponding 
Uppaal models to investigate the benefits of heuristics in pure reachability 
analysis. 
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Table 4. Results for nine erroneous instances of the Biphase Mark protocol. 
The numbers are the number of state explored before reaching an error state. 
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breadth first 


1931 


2582 


4049 


990 


4701 


2561 


1230 


1709 


3035 


in==l heuristic 


1153 


1431 


2333 


632 


1945 


1586 


725 


1039 


1763 



The three parameters in the model are the size of the mark and code cell of the 
sending process and the size of the sampling distance at the receiver. Basically, 
for each bit send, two points needs to be read for the receiver to interpret the 
bit correctly. Three kinds of errors can occur: the ’middle point’ (called mark 
subcell) is missed, the end point is sampled too early or too late. Two of the 
three errors occur only if input ”1” is offered to the receiver, and the third error 
can occur in any case. Therefore we will guide the model to make a breadth 
hrst search but only in the part of the state-space where a ”1” is send. Table 4 
shows the number of states searched in order to find the error in three erroneous 
instances of the protocol. Using the heuristic almost halves the number of states 
searched before the error is found. 

5 Conclusion 

On the preceding pages, we have contributed with (1) a cost function based sym- 
bolic semantics for the class of linearly priced timed automata; (2) an efficient, 
zone based implementation of cost functions for the class of uniformly priced 
timed automata; (3) an optimal search order for finding the minimum cost of 
reaching a goal state; and (4) experimental evidence that these techniques can 
lead to dramatic reductions in the number of explored states. In addition, we 
have shown that it is possible to quickly obtain upper bounds on the minimum 
cost of reaching a goal state by manually guiding the exploration algorithm using 
priorities. 
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Abstract. We present an extension of the model checker UPPAAL ca- 
pable of synthesize linear parameter constraints for the correctness of 
parametric timed automata. The symbolic representation of the (para- 
metric) state-space is shown to be correct. A second contribution of this 
paper is the identihcation of a subclass of parametric timed automata 
(L/U automata), for which the emptiness problem is decidable, contrary 
to the full class where it is know to be undecidable. Also we present a 
number of lemmas enabling the verihcation effort to be reduced for L/U 
automata in some cases. We illustrate our approach by deriving linear 
parameter constraints for a number of well-known case studies from the 
literature (exhibiting a flaw in a published paper). 



1 Introduction 

During the last decade, there has been enormous progress in the area of timed 
model checking. Tools such as Uppaal[11], Kronos [5], and PMC [12] are now 
routinely used for industrial case studies. A disadvantage of the traditional ap- 
proaches is, however, that they can only be used to verify concrete timing prop- 
erties: one has to provide the values of all timing parameters that occur in the 
system. For practical purposes, one is often interested in deriving the (symbolic) 
constraints on the parameters that ensure correctness. The process of manually 
finding and proving such results is very time consuming and error prone (we have 
discovered minor errors in the two examples we have been looking at). Therefore 
tool support for deriving the constraints automatically is very important. 

In this paper, we study a parameterized extension of timed automata, as well 
as a corresponding extension of the forward reachability algorithm. We show the 
theoretical correctness of our approach, and its feasibility by application to some 
non-trivial case studies. For this purpose, we have implemented a prototype ex- 
tension of Uppaal, an efficient real-time model checking tool [11]. The algorithm 
we propose and have implemented is a semi-decision algorithm which will not 
terminate in all cases. In [2] the problem of synthesizing values for parameters 
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T. Margaria and W. Yi (Eds.): TACAS 2001, LNCS 2031, pp. 189-203, 2001. 
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such that a property is satisfied, was shown to be undecidable, so this is the best 
we can hope for. 

A second contribution of this paper is the identification of a subclass of pa- 
rameterized timed automata, called lower bound/upper bound (L/U) automata^ 
which appears to be sufficiently expressive from a practical perspective, while it 
also has nice theoretical properties. Most importantly, we show that the empty- 
ness question for parametric timed automata shown to be undecidable in [2], is 
decidable for L/U automata. We also establish a number of lemmas which allow 
one to reduce the number of parameters when tackling specific verification ques- 
tions for L/U automata. The application of these lemmas has already reduced 
the verification effort drastically in some of our experiments. 

Our attempt at automatic verification of parameterized real-time models is 
not the only one. Henzinger et al. aim at solving a more general problem with 
HyTech [9], a tool for model checking hybrid automata, exploring the state- 
space either by partition refinement, or forward reachability. The tool has been 
applied successfully on relatively small examples such as a railway gate controller. 
Experience so far has shown that HyTech cannot cope with larger examples, 
such as the ones considered in this paper. 

Toetenel et al. [12] have made an extension of the PMC real-time model 
checking tool [4] called LPMC. LPMC is restricted to linear parameter con- 
straints as is our approach, and uses the partition refinement method, like 
HyTech. Other differences with our approach are that LPMC also allows for 
the comparison of non-clock variables to parameter constraints, and for more 
general specification properties (full TCTL with fairness assumptions). Since 
LPMC is a quite recent tool, not many applications have been presented yet. 
However, a model of the IEEE 1394 root contention protocol inspired by [13] 
has been successfully analyzed in [4]. 

A more general attempt than LPMC and our Uppaal extension has been 
made by Annichini et al. [3]. They have constructed and implemented a method 
which allows for non-linear parameter constraints, and uses heavier, third-party, 
machinery to solve the arising non-linear constraint comparisons. Independently, 
we have used the same data-structure (a direct extension of DBMs [8]) for the 
symbolic representation of the state space, as in [3]. Eor speeding up the ex- 
ploration, a method for guessing the effect of control loops in the model is pre- 
sented. It appears that this helps termination of the method, but it is unclear 
under what circumstances this technique can or cannot be used. The feasibility 
of this approach has been shown on a few rather small case studies. 

The remainder of this paper is organized as follows. Section 2 introduces the 
notion of parametric timed automata. Section 3 gives the symbolic semantics, 
which is the basis for our model checking algorithm, presented in Section 3.5. 
Section 4 is an intermezzo that states some helpful lemmas and decidability 
results on an interesting subclass. Einally, Section 5 reports on experiments with 
our tool. Eor lack of space, some technical details and all proofs have been 
omitted, which can be found in the full version of this paper [10]. 
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2 Parametric Timed Automata 

2.1 Parameters and Constraints 

Throughout this paper, we assume a fixed set of parameters P = {pi, ... ,pn}- A. 
linear expression e is either an expression of the form tipi + • • --\-tnPn where 
to, . . . C Z, or oo. We write E to denote the set of all linear expressions. A 
constraint is an inequality of the form e ^ P , with e, linear expressions and 
{<,<,>,>}. The negation of constraint c, notation -ic, is obtained by replacing 
relation signs <, <, >, > by >, >, <, <, respectively. A (parameter) valuation 
is a function v : P ^ R-^ assigning a nonnegative real value to each parameter. 
There is a one-to-one correspondence between valuations and points in 
In fact we often identify a valuation p with the point (p(pi),... ,p(pn)) C 

If e is a linear expression and p is a valuation, then e[v] denotes the expression 
obtained by replacing each parameter p in e with v{p). Likewise, we define c[v] for 
c a constraint. Valuation v satisfies constraint c, notation |= c, if c[v] evaluates 
to true. The semantics of a constraint c, notation [c], is the set of valuations 
(points in (R-^)^) that satisfy c. A finite set of constraints C is called a constraint 
set. A valuation satisfies a constraint set if it satisfies each constraint in the set. 
The semantics of a constraint set C is given by [[C] := Hcec 1^1 * write T 
to denote any constraint set with [[T] = (R-^)^, for instance the empty set. We 
use T to denote any constraint set with [[T] = 0, for instance the constraint set 
{c, -ic}, for some arbitrary c. 

Constraint c corners constraint set C, notation C |= c, iff [C] C [[c]. Constraint 
set C IS split by constraint c iff neither C |= c nor C |= -ic. 

During the analysis questions arise of the kind: given a constraint set C and 
a constraint c, does c hold, i.e., does constraint c cover Cl A split occurs when 
c holds for some valuations in the semantics of C and -ic holds for some other 
valuations. We will not discuss methods for answering such questions: in our 
implementation we use an oracle to compute the following function. 



0{c,C) 



yes 




no 


*/ e g -( 


split 


otherwise 



Observe that using the oracle, we can easily decide semantic inclusion between 
constraint sets: [[C] C [[C^] iff VC G : 0{c(C) — yes. The oracle that we 
use is a linear programming (LP) solver that was kindly provided to us by the 
authors of [4], who built it for their LPMC model checking tool. 



2.2 Parametric Timed Automata 

Throughout this paper, we assume a fixed set of clocks V = • • • , and 

a fixed set of actions A = {ai, . . . , a^}. The special clock xq, which is called the 
zero clocks always has the value 0. 
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A simple guard is an expression / of the form Xi — Xj ^ e, where Xi,Xj 
are clocks, {<,<}, and e is a linear expression. We say that / is proper 
if i zjz j. We define a guard to be a (finite) conjnnction of simple gnards. We 
let g range over gnards and write G to denote the set of gnards. A clock val- 
uation is a fnnction w : X ^ R-^ assigning a nonnegative real valne to each 
clock, snch that tc(a?o) = 0. We will identify a clock valnation w with the point 
• • • , uj{xm)) E (R>0)"^+i. Let ^ be a gnard, v a parameter valnation, and 
w a clock valnation. Then g[v,w] denotes the expression obtained by replacing 
each parameter p with v{p), and each clock x with w{x). A pair (i;, m) of a pa- 
rameter valnation and a clock valnation satisfies a gnard notation tc) g, 
if w] evalnates to trne. The semantics of a gnard notation [[^], is the set 
of pairs (i;, m) snch that (i;, w )\= 9 . 

A reset is an expression of the form, Xi b where i 0 and 6 G N. A reset 
set is a set of resets containing at most one reset for each clock. The set of reset 
sets is denoted by R, 

We now define an extension of timed antomata [1,15] called parametric timed 
antomata. Similar models have been presented in [2,3,4]. 

Definition 1 (PTA). A parametric timed antomaton (PTA) over set of clocks 
A, set of actions A, and set of parameters P, is a guadruple A = 
where Q is a finite set of locations^ Qo E Q is the initial location, Q x 

A X G X R X Q IS a finite transition relation, and function I : Q ^ G assigns 
an invariant to each location. We abbreviate a {q, a, g , r, q^) G^ consisting of a 
source location, an action, a guard, a reset set, and a target location as q qE 
For a simple guard Xi — Xj -< e to be used in an invariant it must be the case 
that Xj = xq, that is, the simple guard represents an upper bound on a clock. 

Example 1. A parametric timed antomaton with clocks x, y and parameters p, 
q can be seen in Fig. 1. The initial state is AO which has invariant x < p, and the 
transition from the initial location to A1 has gnard y > q and reset set a? := 0. 
There are no actions on the transitions. Initially the transition from 50 to 51 is 
only enabled if p < q, otherwise the system will be deadlocked. 




x<=5, 

y<=q+3 



Fig. 1. A parametric timed antomaton 



To define the semantics of PTAs, we reqnire two anxiliary operations on clock 
valnations. For clock valnation w and nonnegative real nnmber d, w d is the 
clock valnation that adds to each clock (except a?o) a delay d. For clock valnation 
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w and reset set r, w[r] is the clock valuation that resets clocks according to r. 



(tc + d){x) 



0 if X = xo 

w{x) + d otherwise 




b if a? := 6 G r 
w{x) otherwise. 



Definition 2 (Concrete semantics). Let A = ^ PTA and v 

be a parameter valuation. The concrete semantics of A under v, notation 
IS the labeled transition system (LTS) (S', So, over A U R-^ where 

S = {{q, w) e Q X {X ^ R-“) I w{xo) = 0 A (v,w) j= I(q)}, 

So = {{q, w) e S \ q = qo Aw = Aar.O}, 

and transition predicate is specified by the following two rules ^ for all (q,w)^ 
{q\ w^) G S, d > 0 and a G A, 

— [q, w) — ^ [q\ w^) if q — and — w A d, 

- (q, w) (q', w') if3g, r : q ^ q' A{v,w) \=g Aw' = w[r]. 



2.3 The Problem 

In its current version, IJPPAAL is able to check for reachability properties, in 
particular whether certain combinations of locations and constrains on clock 
variables are reachable from the initial configuration. Our parameterized exten- 
sion of Uppaal handles exactly the same properties. However, rather than just 
telling whether a property holds or not, our tool looks for constraints on the 
parameters which ensure that the property holds. 

Definition 3 (Properties). The sets o/ system properties and state formulas 
are defined by^ respectively^ 

f) ::= \jU<j) I 30(1) <j) ::= x — y A b \ q\^<f \ <f f\ <f 

where G A, 6 G N and q ^ Q, Let A be a PTA^ v a parameter valuation^ 
s a state of [[AJtf, and f a state formula. We write s f if f holds in state s^ 
we write [[AJtf |= if f holds in all reachable states of [[AJtf, and we write 

[[A]i; 1= 30 f if f holds for some reachable state oflAJv. 

The problem that we address in this paper can now be stated as follows: Given 
a parametric timed automaton A and a system property compute the set of 
parameter valuations v for which 1= Ip. 

Timed automata [1,15] arise as a special case of PTAs for which the set P 
of parameters is empty. If A is a PTA and i; is a parameter valuation, then the 
structure A[v] that is obtained by replacing all linear expressions e that occur in 
A by e[v] is a timed automaton.^ It is easy to see that in general l^AJy = [[A[p]]. 
Since the reachability problem for timed automata is decidable [1], this implies 
that, for any A, integer valued v and [[A]v |= '0 is decidable. 



^ Strictly speaking, A [a] is only a timed automaton if v assigns an integer to each 
parameter. 
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3 Symbolic State Exploration 

Our aim is to use basically the same algorithm for parametric time model check- 
ing as for timed model checking. We represent sets of states symbolically in a 
similar way and support the same operations used for timed model checking. 
In the nonparametrized case, sets of states can be efficiently represented using 
matrices [8]. Similarly, in this paper we represent sets of states symbolically as 
(constrained) parametric difference-bound matrices. 

3.1 Parametric Difference-Bound Matrices 

In the nonparametrized case, a difference-bound matrix is a (m -h 1) x (m -h 1) 
matrix whose entries are elements from (Z U {cxd}) x {0, 1}. An entry (c, 1) for 
Dij denotes a nonstrict bound Xi — xj < c, whereas an entry (c, 0) denotes a 
strict bound Xi — xj < c. Here, instead of using integers in the entries, we will 
use linear expressions over the parameters. Also, we find it convenient to view 
the matrix slightly more abstractly as a set of guards. 

Definition 4 (PDBM). A parametric difference-bound matrix (PDBM) is a 
set D which contains, for all 0 < i,j < m, a simple guard Dij of the form 
Xi — Xj -<ij Cij. We require that, for all i, Du is of the form Xi — Xi < 0. Given 
a parameter valuation v, the semantics of D is defined by {Djy = l/\i j Dijjy. 
We say that D is sat isfi able for v if {Djy is nonempty. If f is a proper guard 
of the form Xi — Xj ^ e then we write D[f] for the PDBM obtained from D by 
replacing Dij by f. If i,j are indices then we write D^^ for the pair [cij , Aij); 
we call D^^ a bound of D. Clearly, a PDBM is fully determined by its bounds. 

Definition 5 (Constrained PDBM). A constrained PDBM ts a pair {C\ D) 
where C is a constraint set and D is a PDBM. The semantics of a constrained 
PDBM ts defined by [[C, Dl = {{v,w) | i; G [C] A tc G 

PDBMs with the tightest possible bounds are called canonical. To formalize 
this notion, we view Boolean connectives as operations on relation symbols < and 
< by identifying < with 1 and < with 0. Thus we have, for instance, (< A <) =<, 
(< A <) =< and (< <) =<. Our definition of a canonical form of a 

constrained PDBM is essentially equivalent to the one for standard DBMs. 

Definition 6 (Canonical Form). A constrained PDBM (C, D) is in canonical 
form iff for all i,j, k, C ^ Cij {-<ij -<ik A -<kj) Cik + Ckj. 

The next important lemma, which basically carries over from the unpara- 
metrized case, states that canonicity of a constrained PDBM guarantees satisfi- 
ability. 

Lemma 1. Suppose {C,D) ts a constrained PDBM in canonical form and v G 
[[C]. Then D ts satisfiable for v. 

Also the following lemma essentially carries over from the unparametrized 
case, see for instance [8]. As a direct consequence, semantic inclusion of con- 
strained PDBMs is decidable for canonical PDBMs (using the oracle function). 

Lemma 2. Suppose (C, D), [C) D^) are constrained PDBMs and (C, D) is canon- 
ical. Then [C, Dj C [O, Wj ^ ([C] C [C'l A Vi,i : C g ^ 
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3.2 Operations on PDBMs 

Our algorithm requires basically four operations to be implemented on con- 
strained PDBMs: adding guards, canonicalization, resetting clocks and comput- 
ing time successors. 



Adding Guards In the case of DBMs, adding a guard is a simple operation. 
It is implemented by taking the conjunction of a DBM and the guard (which 
is also viewed as a DBM). The conjunction operation just takes the pointwise 
minimum of the entries in both matrices. In the parametric case, adding a guard 
to a constrained PDBM may result in a set of constrained PDBMs. We define 
a relation ^ which relates a constrained PDBM and a guard to a collection of 
constrained PDBMs that satisfy this guard. For this we need an operation C 
that takes a PDBM and a simple guard, and produces a constraint stating that 
the bound imposed by the guard is larger than the corresponding bound in the 
PDBM, so let = (e*j, Aij) then C{D,Xi — Xj A e) = {-<ij => A) e. 

Relation ^ is defined as the smallest relation that satisfies the following rules: 

. 0{C{DJ),C) = yes (P(C(D, /), <T) = no, / proper 

^ ^ {C,D)X{C,D) ^ ^ {C\D)X{C\D[f]) 






0{C{D,f),C)= split 
{C,D)X{CiJ{C{D,f)},D) 



m 



0(C(D, /), C) = split, / proper 

iC,D)X{Cu{^CiDJ)},D[f]) 



{R5) 



(C, D) (C, D') , {CD') X {C”, D") 



(C, D) (C”, D”) 



Lemma 3. {C, Dj f) M = U{[0, D'} \ {C, D) ^ (O, D')}. 



Canonicalization Each DBM can be brought into canonical form using clas- 
sical algorithms for computing all-pairs shortest paths, for instance the Floyd- 
Warshall (FW) algorithm [6]. In the parametric case, we also apply this approach 
except that now we run FW symbolically. Below, we describe the computation 
steps of the symbolic FW algorithm in SOS style. Recall that the FW algorithm 
consists of three nested for-loops, for indices i and j, respectively. Correspond- 
ingly, in the SOS description of the symbolic version, we use configurations of the 
form {k, i, j, C, D), where (C, D) is a constrained PDBM and k,i, j G [0, m -h 1] 
record the values of indices. In the rules below, k^i^j range over [0, m]. 



iC,D) (C',g') 

(kj ij Jj C\ D) -^Fw (k, b j -h 1, C\ D^) 



(A:, /, Tfi 1, C, D) — ypw (A:, z -h 1, 0, C, D) 
(A;, m -j- 1, 0, C, D) -^fw {k H- 1, 0, 0, C, D) 
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We write {C^D) there exists a sequence of -^fw steps leading 

from configuration (0, 0, 0, 67, to configuration (m+1, 0,0, 67^, D^). In this case, 
we say that ( 67 ^, D^) is an outcome of the symbolic Floyd-Warshall algorithm on 
{C,D). If the semantics of {C,D) is empty, then the set of outcomes is also 
empty. We write {C\ D) (67^,79^) iff {C\ D) (67^^, (67^, for 

some C'' , . 

The following lemma says that if we run the symbolic Floyd-Warshall algo- 
rithm, the union of the semantics of the outcomes equals the semantics of the 
original constrained PDBM. 

Lemma 4. [C, Dj = U{[0, D'} \ {C, D) (C", D')}. 

Resetting Clocks A third operation on PDBMs that we need is resetting 
clocks. Since we do not allow parameters in reset sets, the reset operation on 
PDBMs is essentially the same as for DBMs, see [15]. The following lemma 
characterizes the reset operation semantically. 

Lemma 5. Let (67, D) he a constrained PDBM in canonical form, v G [67], and 
w a clock valuation. Then w E [D[r]]^; iff^w^ E {Djy : w = wfr]. 



Time Successors Finally, we need to transform PDBMs for the passage of 
time, notation Df. As in the DBMs case [8], this is done by setting the Xi — xq 
bounds to (oo, <), for each i 0, and leaving all other bounds unchanged. We 
have the following lemma. 

Lemma 6. Suppose (67, D) is a constrained PDBM in canonical form, v E [67], 
and w a clock valuation. Then w E [Dt]v iff 3d >0 3w^ E {Djy : d = w. 



3.3 Symbolic Semantics 

With the four operations on PDBMs, we can describe the semantics of a para- 
metric timed automaton symbolically. 

Definition 7 (Symbolic semantics). The symbolic semantics of PTA A = 
{Q^ LTS, The states are triples (^^,67, D) with q a location from 

Q and (67, D) a constrained PDBM in canonical form. Let E be the PDBM with 

E*7 = (0,<), for all i,j. The set of initial states is {{qQ,C,D) \ (T,E^) 

(67, D)}. The transitions are defined by the following rule: 

q q’ , (C, D) (C", D") , (C", £>"[r]t) (O, D’) 

iq,C,D)^iq',C',D') 

Using Lemma 3 and Lemma 4, it follows by a simple inductive argument that if 
state (^^,67, D) is reachable in the symbolic semantics and ('e,te) E [67, D] then 
1= I{q). It is also easy to see that the symbolic semantics of a PTA is a 
finitely branching transition system. It may have infinitely many reachable states 




Linear Parametric Model Checking of Timed Automata 



197 



though. Our search algorithm explores the symbolic semantics in an “intelligenP^ 
manner, and for instance stops whenever it reaches a state whose semantics is 
contained in the semantics of a state that has been encountered before. Despite 
this, our algorithm need not terminate. 

Each run in the symbolic semantics can be simulated by a run in the concrete 
semantics. 

Proposition 1. For each parameter valuation v and clock valuation w, if there 
IS a run in the symbolic semantics of A reaching state {q,C,D)^ with {v,w) G 
[[(C, D], then this run can be simulated by a run in the concrete semantics [[Al]v 
reaching state (q,w). 

For each path in the concrete semantics, we can find a path in the symbolic 
semantics such that the final state of the first path is semantically contained in 
the final state of the second path. 

Proposition 2. For each parameter valuation v and clock valuation if there 
IS a run in the concrete semantics lAh reaching a state {q,w), then this run can 
be simulated by a run in the symbolic semantics reaching a state {q,C,D) such 
that (i;, w) G [C, D]. 



3.4 Evaluating Properties 



We will now explain the relation |=^ which relates a symbolic state and a state 
formula to a collection of symbolic states that satisfy For lack of space, we 
do not give the full formal definition. 

In order to check whether a property holds, we break it down into the small 
basic formulas, namely checking locations and clock guards. Checking that a 
clock guard holds relies on the definition given earlier, of adding that clock 
guard to the constrained PDBM. We rely on a special normal form of the state 
formula, in which all -i signs have been pushed down to the basic formulas. 

The following lemma gives the soundness of relation |=4>. 

Lemma 7. Letl_<f),qJ denote theset{{v,w) \ {w,q) |= Then for all properties 

(f) m normal form [C, Dj n l<f>, ^ 1=0 {[O, -D'l | {q, C, D) {q, C , D')}. 



3.5 Algorithm 

We are now in a position to present our model checking algorithm for parametric 
timed automata. The following algorithm describes how our tool explores the 
symbolic state space and searches for constraints on the parameters for which a 
reachability formula 30<f) holds in a PTA A. 




198 Thomas Hune, Judi Romijn, Marielle Stoelinga, and Frits Vaandrager 



algorithm Reachable(^, (j)) 

Result := 0, Passed := 0, Waiting := {{qo,C\D) \ (T,Et) {C\D)} 
while Waiting / 0 do 

select (g, C, D) from WAITING 

Result := Result u {(g^ C\ D^) \ (g, C, D) (g^ C' , D')} 

False := {(g', C , D') \ (g, C, D) ^ (g', C , D')} 
for each (g^, C' ^ D') in False do 

if for all (g^^ in Passed: (g^ C'^ D’) g (g^^ D’’) then 

add {q',C',D^) to PASSED 

for each (g^^ D’^) such that (g^ C'^ D’) (g^^ C” , D”) do 

Waiting := Waiting u {(g^^ C" , D")} 
return Result 

The result returned by the algorithm is a set of symbolic states, all of which 
satisfy for any valuation of the parameters and clocks in the state. For invari- 
ance properties the tool performs the algorithm on -u/), and the result is 

then a set of symbolic states, none of which satisfies (j). The answer to the model 
checking problem, stated in Section 2.2, is obtained by taking the union of the 
constraint sets from all symbolic states in the result of the algorithm; in the case 
of an invariance property we take the complement of this set. 

Some standard operations on symbolic states that help in exploring as little as 
possible, have also been implemented in our tool for parametric symbolic states. 
We give a short explanation here, and refer to the full version of this paper for the 
complete story with technical details. Before starting the state space exploration, 
our implementation determines the maximal constant for each clock. This is the 
maximal value to which the clock is compared in any guard or invariant in the 
PTA. When the clock value grows beyond this value, we can ignore its real value. 
This enables us to identify many more symbolic states, and helps termination. 

4 Reducing the Complexity 

This section introduces the class of lower bound/upper bound automata and 
describes several (rather intuitive) observations that simplify the model checking 
of PTAs in this class. Our results allow us to eliminate parameters in certain 
cases. Since the complexity of parametric model checking grows very fast in 
the number of parameters, this is a relevant issue. Secondly, our observations 
yield a decidability result for lower bound/upper bound automata whereas the 
corresponding problem for general PTAs is undecidable. 

Informally, a positive occurrence of a parameter in a PTA enforces (or con- 
tributes to) an upper bound on a clock difference, for instance p in x — y < 2p. 
A negative occurrence of a parameter contributes to a lower bound on a clock 
difference, for instance g and q' in y — x y q -\- 2g^ (= a? — g < — g — 2g^) and in 
X — y < 2p — q — 2gF 

Definition 8. A parameter pi G P is said to occur in the linear expression 
e = to ti • Pi • An • Pn tf ti 7 ^ 0; pi occurs positively in e if ti > 0 and 
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Pi occurs negatively m e if ti < 0. A lower bound parameter of a PTA A is 
a parameter that only occurs negatively in the expressions of A and an upper 
bound parameter of A a parameter that only occurs positively in A, We call A a 
lower bound/upper bound (L/U) automaton if every parameter is either a lower- 
bound parameter or an upper bound parameter of A, but not both. 

From now on, we work with a fixed set L = {h, . . .Ik} of lower bound 
parameters and a fixed set U — {i/i, . . .um} of upper bound parameters with 
A n [/ = 0 and T U [/ = P. 

We consider, apart from parameter valuations, also extended parameter val- 
uations. Intuitively, an extended parameter valuation is a parameter valuation 
with values in R-^ U {oo}, rather than in R-^. We denote an extended valuation 
of an L/U automaton by a pair (A,//), which equals the function A on the set 
L and p on U and require that A and p do not both assign the value oo to a 
parameter. Then we can extend the notions defined for parameter valuations 
(Section 2) to extended valuations in the obvious way. We write 0 and oo for the 
functions assigning respectively 0 and oo to each parameter. 

The following proposition is based on the fact that weakening the guards in 
A (i.e. decreasing the lower bounds and increasing the upper bounds) yields an 
automaton whose reachable states include those of A. Dually, strengthening the 
guards in A (i.e. increasing the lower bounds and decreasing the upper bounds) 
yields an automaton whose reachable states are a subset of those of A. We 
claim that this proposition, formulated for L/U automata, can be generalized to 
lower bound and upper bound parameters present in general PTAs. It is however 
crucial that (by definition) state formulae do not contain parameters. 

Proposition 3. Let A be an L/U automaton and <f) a state formula. Then 

Ml(A,^i) 1= <S=^ VA' < A, //<//' : 1= 304>. 

Ml(A,^i) 1= <r,n'<n: 1= vo(/>. 

The following example illustrates how Proposition 3 can be used to eliminate 
parameters from L/U automata. 

Example 2. The automaton in Fig. 2 is an L/U automaton. Its location Si is 
reachable irrespective of the parameter values. By setting the parameter min 
to oo and max to 0, one checks with a non-parametric model checker that 
A[(oo,0)] 1= Then Proposition 3 (together with IA\ = A[t^]) yields that 

Ai is reachable in for all extended parameter valuations 0 < X, p < oo. 

Clearly, |= 30^2 iff A(mm) < p{max) A A(m/n) < oo. We will see 

in this running example how we can verify this property completely by non- 
parametric model checking. Henceforth, we construct the automaton A' from 
A by substituting the parameter max by the parameter min yielding an (non 
L/U) automaton with one parameter, min. If we show that |= 30^2 for all 

valuations i;, this essentially means that |= 30^2 for all X, p such that 

p{max) = A(m/n) < oo and then Proposition 3 implies that |= 30^2 

for all X, p with A(m/n) < p{max) and A(m/n) < oo. 
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SI x<=max 



Fig. 2. Reducing parametric to non-parametric model checking 



The question whether there exists a (non-extended) parameter valuation such 
that a given (final) location q is reachable, is known as the emptiness problem 
for PTAs. In [2], it is shown that the emptiness problem is undecidable for 
PTAs with three clocks or more. Proposition 3 implies 3A, p : ^ |= 30q iff 

A[0,oo] 1= 30q. Here, (A,//) range over extended parameter valuations, but is 
not difficult to see that the statement also holds for (A, p) just valuations. Since 
v4[(0,oo)] is a non-parametric timed automaton and reachability is decidable 
for timed automata ([!]), the emptiness problem is decidable for L/U automata. 
Then it follows that the dual problem is also decidable for L/U automata. This is 
the universality problem for invariance properties, asking whether an invariance 
property holds for all parameter valuations. 

Corollary 1. The emptiness problem is decidable for L/U automata. 

Definition 9. A PTA A IS fully parametric if clocks are only reset to 0 and 
every linear expression in A of the form ti *pi + • • • + U ' Pn, where ti G Z. 

The following proposition is basically the observation in [1], that multiplica- 
tion of each constant in a timed automaton and in a system property with the 
same positive factor preserves satisfaction. 

Proposition 4. Let A be fully parametric PTA. Then 

Mil, 1= i’ V/ e ^ 1= i • V”, 

where t • v denotes the valuation p i-T f * v{p) and t • / the formula obtained from 
ip by multiplying each number in / by t. 

Then for fully parametric PTAs with one parameter and system properties 
fj without constants (except for 0), we have |= / for all valuations v of P 

if and only if both A[0] |= / and A[l] |= 

Corollary 2. For fully parametric PTAs with one parameter and properties ip 
without constants (except 0), it is decidable whether Mv G [C] : l^A\ |= ip. 

Example 3. The PTA A^ mentioned in Example 2 is a fully parametric automa- 
ton and the property 3032 is without constants. We establish that A^[0] |= 30^2 
and A/l] 1= 30^2 . Then Proposition 4 implies that A/v] |= 30^2 for all v. 
As shown in Example 2, this implies that |= 30^2 for all A, p with 

A(m/n) = p[max) < oo. 
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In the running example, we would like to use the above methods to verify 
that 30^2 if A(mm) > //(max). We can in this case not fill in for 

mm = max, since the bound in the constraint is strict. The following definition 
and result allows us to move the strictness of a constraint into the PTA. 

Definition 10. Define as the automaton obtained by replacing every in- 
equality X — y < e in A by a strict inequality x — y <_ e, provided that e contains 
at least one parameter. 



Proposition 5. Let A be an L/U automaton. Then 

M<1(a N VA < A', //'<//: |=vn<?i. 

M<1(a.,) N 30,/- ^ VA' < A, N 30,^. 

We claim that we can extend the result above to a more general construction 
Alp/ , where we replace a guard x — y< ehj x — y < ebyif and only if a parameter 
p from occurs in e. Then the proposition generalizes to Alp/, provided that 
we replace A < by A <p/ A^ (and similar replacements for X' < X, p < p' , 
p^ < p). Here, v <p/ P is defined as v{p) < vfip) if p G and v{p) = vfip) 
otherwise. 

Example fi Consider the PTA Al^ , which equals the PTA in Fig. 2, except that 
X < max has been replace by x < max and x > min by x > min. Now, we 
construct the automaton Al^ from A^ by substituting the parameter max by 
mm. By checking that Al^[0] |= VO-'A 2 and Al^[l] |= VO-iA 2 , Proposition 4 yields 
that Al^[x] 1= \/n-iA 2 for all valuations v. Then we know by Proposition 3 that 
M1 (a N Vn-iA 2 if oo > A(mm) > p[max). Now, Proposition 5 concludes 
that if oo > A(m/n) > p{max) then [Al]^;^ |= \/n-iA 2 i.e. ^ 30^2. 

Combining the results from the examples in this section yields [Al]^;^ |= 30^2 

if and only if A(m/n) < p{max) A A(mm) < oo. 

5 Experiments 

In this section, we report on the results of experimenting with a prototype ex- 
tension of IJPPAAL described in the previous sections. For lack of space, we give 
a short impression of the experiments, which are described in greater detail in 
the full version [10]. 

The Root Contention Protocol The root contention protocol is part of a leader 
election protocol in the physical layer of the IEEE 1394 standard (FireWire/i- 
Link), which is used to break symmetry between two nodes contending to be the 
root of a tree, spanned in the network topology. 

We use the IJpPAAL models of [14,13], turn the constants used into parame- 
ters, and experiment with our prototype implementation (see Fig. 3 for results^). 

^ All experiments were performed on a 366 MHz Celeron, except the liveness property 
which was performed in a 333 MHz SPARC Ultra Enterprise. 
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In both models, there are five constants, all of which are parameters in onr ex- 
periments. We have checked for safety and liveness on the parametric models, 
and have applied redactions as proposed in Section 4 where this was possible, 
to rednce the verification effort. In some cases, we conld even derive the para- 
metric conclnsions by non-par ametric model checking, which we have done with 
standard IJPPAAL. 
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Fig. 3. Experimental results for the root contention protocol 



The Bounded Retransmission Protocol This protocol was designed by Philips for 
communication between remote controls and audio/ video/TV equipment. In [7] 
constraints for the correctness of the protocol are derived by hand, and some 
instances are checked using IJpPAAL. Based on the models in [7], an automatic 
parametric analysis is performed in [3], however, no further results are given. 
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Fig. 4. Experimental results for the bounded retransmission protocol 



Eor our analysis we have also used the timed automata models from [7]. In 
[7] three different constraints are presented based on three properties which are 
needed to satisfy the safety specification of the protocol. We are only able to 
check two of these since one of the properties contain a parameter which our 
prototype version of IJpPAAL is not able to handle yet. The results can be found 
in Eig. 4^. Note that out of the four constants in the model which are candidates 
for parameters, the model checked for property ^safety E and ^safety 2’ uses two 
and one as parameters respectively. A minor error in [7] was found while checking 
^safety E, which has been corrected by the authors of [7]. 

^ All experiments run on a 333 MHz SPARC Ultra Enterprise. 
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Abstract. Process algebras with abstraction have been widely used for 
the specihcation and verihcation of non-probabilistic concurrent systems. 
The main strategy in these algebras is introducing a constant, denoting 
an internal action, and a set of fairness rules. Following the same ap- 
proach, in this paper we propose a fully probabilistic process algebra 
with abstraction which contains a set of verihcation rules as counter- 
parts of the fairness rules in standard A CP-like process algebras with 
abstraction. Having probabilities present and employing the results from 
Markov chain analysis, these rules are expressible in a very intuitive 
way. In addition to this algebraic approach, we introduce a new ver- 
sion of probabilistic branching bisimulation for the alternating model of 
probabilistic systems. Different from other approaches, this bisimulation 
relation requires the same probability measure only for specihc related 
processes called entries. We claim this dehnition corresponds better with 
intuition. Moreover, the fairness rules are sound in the model based on 
this bisimulation. Finally, we present an algorithm to decide our branch- 
ing bisimulation with a polynomial-time complexity in the number of the 
states of the probabilistic graph. 



1 Introduction and Motivation 

In this work we treat the problem of abstraction from internal actions in fully 
probabilistic process algebra and its model based on branching bisimulation. 
One of the motives to introduce probabilities in formal methods is that they 
can be used to model fairness. Since the idea of fairness rules ([5]) together with 
abstraction (introduced by the abstraction operator tj and a constant r denoting 
an internal action) is central to the verification techniques in process algebra we 
introduce verification rules in fully probabilistic process algebra that arise rather 
in a natural way from the ones defined in standard process algebra. These rules 
express the idea that due to a non-zero probability for a system to execute 
an external action, abstraction from internal steps will yield the external step(s) 
with probability 1 after finitely many repetitions. For example, if one process can 
execute external action a with probability tt, external action b with probability 
p and with probability 1 — tt — p after executing an internal action it behaves the 
same as initially, then it is clear that the probability to perform the internal step 
infinitely many times is equal to 0, or in other words, the probability to perform 
either a or 6 eventually is 1. Next, the question arises: “With what probability 
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a (resp. 6) occurs?”. [9] gives the answer to this question as: the probability of 
a is 7r/(7T + p) and the probability of b is p/{7r + p). This corresponds to the 
absorption probabilities for the Markov chain given in Figure la. In our theory 
the notion that relates these two processes can be easily expressed with the 
following verification rule: if X = a bTr 6 -bf^ i • X (where i is an internal action), 
then r • rpq (X) = r • (a • (For more details of the semantics see [2] and 

[7].) A reader familiar with process algebra can easily see the resemblance with 
the KFAR\ rule ([7]) with non-deterministic choice replaced by probabilistic 
choice. 




Fig. 1. Absorbing Markov chains. 



Proceeding with similar reasoning for the more complex rule KF AR\ we achieve 
a situation in which the definition of weak (branching) bisimulation proposed 
in [9] cannot abstract away the internal cycle. But working with recursive equa- 
tions in our process algebra we can introduce a counterpart of this rule in the 
probabilistic setting in the following way: 

Xi = i.X2^.yi 

X 2 = i • Xi -^pY2, I = {/} (PVR2) 

r • r/(Xi) = r • {ri{Yi)±±aTi{Y 2 )) 

where Xi is the root variable and 0 ' = (1 — 7 t )/(1 — Tip). The transition systems 
(as defined in [9]) for these processes are given in Figure 2 (for the sake of 
simplicity the initial internal steps are not shown) and the corresponding Markov 
chain of the first process is shown in Figure lb. The values of probabilities a 
and (3 — 1 — a — 7 t (1 — p)/(l — Tip) are obtained as the absorption probabilities 
when Xi is the root variable, that is, 1 is “the initial state of the system” (in 
terms of the Markov chain theory) for the Markov chain in Figure lb. We point 
out that the absorption probabilities for this system differ for various initial 
distributions. 

Further on, we define a probabilistic branching bisimulation relation on the 
set of fully probabilistic graphs that abstracts away internal actions in wider 
variety of cases than the definition in [9] . It will turn out that the set of graphs 
modulo this relation gives rise to a model of our process algebra. Two nodes are 
considered bisimilar if they have the same branching structure and if taken as 
roots they have the same probability measures. This means that if the system 
is in either of these states then the probabilities to execute a visible action and 
also the probabilities to enter with an internal step into a different equivalence 
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Fig. 2. Related fully probabilistic processes. 



class are the same. In this definition the notion of an entry plays a central role 
(a notion to be formally defined in Definition 10 on page 212). Starting from the 
roots of the graphs we build a set of entries for which the probability measure 
is checked in the next stage. Informally, we can say that a node not found as an 
entry is just involved in (is just a part of) an internal path (or a cycle) that starts 
in an entry with the same branching structure as the non-entry node; the node 
is not involved in any other path. Also, after execution of a visible action the 
system never goes to the state interpreted by this non-entry node. For example, 
in the graph in Figure 2a, ti{X 2 ) is a non-entry node (no path from t/(Yi) or 
ti{Y 2 ) goes back to r/(X 2 )). 

Moreover, in the paper we give an algorithm which decides the probabilis- 
tic branching bisimulation in polynomial time in the number of states of the 
probabilistic graph. 

Because parallel composition based on interleaving (see e.g. [2,3]) includes 
non-determinism, this algebra and semantics cannot deal with such parallel com- 
position. However, since we define the branching bisimulation on the alternating 
model, and we also consider the non-deterministic (or action) nodes in our defini- 
tion of bisimulation, we expect that the extension with non-deterministic choice 
can be achieved on the basis of the results presented here. This will enable the 
extension with interleaving parallel composition. 

Motivating example. In order to depict the idea of our approach we give the 
following motivating example. An experimenter has two coins A and R. A is a 
fair coin with the probability distribution {1/2 head^ 1/2 tail}, and B is biased 
with distribution {1/3 head, 2/3 tail}. First he throws coin A. If head turns up 
the throwing is over and he announces “head” . If tail shows up then he throws 
coin B. If tail turns up then the throwing is over and he announces “tail”, but 
if head turns up then he takes coin A and performs the experiment again. The 
process can be specified by the following recursive specification: 

A = tailA • B -\A 1 i 2 headA • sayhead 
B = heads * • saytail 

where sayhead and saytail are atomic actions expressing the observable events of 
announcing “head” and “tail” , respectively. Abstracting from head a, tail a, heads 
and tails and applying the rule PVR2 we obtain that the probability to end the 
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experiment by saying “head” or “tail”, is 3/5 and 2/5, respectively. We point out 
that coin A was chosen as the initial coin. We can imagine that the throwing is 
performed in an isolated room and an observer can hear only the final outcome, 
but for her it is not clear what kind of experiment is performed in the room. 

The observer will make the same observation if the experimenter performs 
some other experiment. Namely, instead of two coins he has only one fair die. He 
rolls it and if the outcome is 1 he rolls it again. If the outcome is an even number 
he announces “head” and if the outcome is 3 or 5 he announces “tail” . □ 

Related Work We have already mentioned the relation of this paper with 
[9]. Namely, the branching (weak) bisimulation for fully probabilistic systems 
presented in [9] is finer than ours. Reasoning in the process algebra which we 
present here, with the rules FVRl, FRV2, . . . and speaking informally, their 
bisimulation defines just a model of FVFl, not of FVR2 and more complex 
rules (see further on). 

Similarities of the bisimulation dehned here can be found with the branching 
bisimulation of Jou in [11]. The proposed probabilistic branching bisimulation in 
the latter work is defined on the set of finite trees. And the author A attention is 
more focused on the axiom: a{T{x-\A^y)-\±py) = a{x±± 7 ^py) and not on any rules 
that treat internal cycles. Our definition of bisimulation coincides with his one 
on the set of finite trees (in terms of process algebra, the set of closed terms). 
And the branching bisimulation presented in this paper is an extension of his 
branching bisimulation over infinite processes. 

2 Definitions and Results 

In [2] a probabilistic process algebra containing both probabilistic choice and 
non-deterministic choice is introduced. Our current work is based on a sub- 
algebra of that one for which non-deterministic choice has been excluded. Hav- 
ing both choices and abstraction at the same time leads to a more complex 
axiomatization and this extension, we think, can be achieved on the basis of the 
dehnitions we give here. Due to the absence of non-determinism the interleaving 
parallel composition as treated in [2] cannot be incorporated in this fully proba- 
bilistic process algebra. On the other hand, some version of synchronous parallel 
composition may be considered in such a process algebra (also see [6]). 

In addition to the set of atomic actions A and the constant r the fully prob- 
abilistic process algebra presented here has three operators: the probabilistic 
choice operator -b^r , tt G (0, 1), the sequential composition • and the abstraction 
operator tj for I C A. The axiom system is given in Table 1 and 2. Informally, 
process x-\A^y behaves as x with probability tt and as y with probability 1 — tt. 
Also, process x-\A^y±±pZ behaves as x with probability tt, as y with probability 
p and as z with probability 1 — tt — p. This algebra will be denoted by prBFA^. 
We also add to the algebra a set of verification rules PVRl and PVRn for n>2: 
Xi = i- Xi Yi,t ^ i e I 

(PVRl) 

T ■ Ti{Xi) = T ■ r/(Yi) 
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Xi = ii ■ X2 tt-Tnyi 
X 2 = • X3-H-^2Y2 

(PVRn) 

- 1 — ^n — 1 ' L-^tt n-i — 1 

1 ^ U {t} , /*2 7 • • • 7 ^*n } ^ 1^1 

T ■ Tl(Xi) = T • (r/(Yi)-tt-„jT/(Y 2 )'tt-a 2 • • • ■tt‘a„_2 'r/(^n-l) 7 /(Y„)) 

where «i = , aj = for j :l< j <n and tt^ £ (0, 1) 

for : 1 < /? < n. If we refer to the algebra extended with these rules we write 
prBPA^ + PVRl + PVm + .... 

{x -y) • z = X ' {y ' z) A5 

xh^y — ytti-TT^r PrACl 

xti-^yti-pz) = ixt±_^y) 1—1 7T+ p—TVpZ PrAC2 

xtTj^x — X PrACS 

(x-tt^y) ' z —X'Z~t^y'Z PrACA 

Table 1 . Axioms for probabilistic choice and sequential composition. 

X • r = X T1 

tj{t) = T T/0 

Tj(a) = a ifa0/T/l 

rj(a) = T ifaG/T/2 

Tj{x-y) =Ti{x)^Tj{y) TIA 

Ti{xA±^y) = Ti{x)A:^^Ti{y) PrTI 

Table 2. Axioms for abstraction (/ C Ar). 

The reader has noticed that we use a set of equations of the form Xj = Pj , j = 
1, . . . , n, to specify recursive behaviour. In the above equations, Xi, . . . , Xn are 
pairwise distinct variables and Pi, P^ are guarded terms over the given sig- 
nature (see e.g. [7]). Every recursive specification has a root variable. In the 
verification rule PVRn, Xi is the root variable of the recursive specification. 

In order to construct a model of this algebra we introduce fully probabilistic 
graphs. Further on, we define probabilistic branching bisimulation. We work in 
the alternating model with two types of nodes (processes): probabilistic nodes 
with probabilistic outgoing transitions only, denoted by and action nodes 
with action transitions only, denoted by for a ^ Aj. By allowing at most 
one action transition to leave an action node we obtain the alternating model of 
fully probabilistic process algebra. 

Definition 1 . Let A be a countable set of atomic actions. A fully probabilistic 
graph g is a tuple {Sp U An U {N p, root) consisting of: 

- a countable set Sp of probabilistic states, 

- a countable set Sn o/ action states such that SpCiSn =0 and NIL ^ SpUSn? 
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- root G Sp, 

- a relation Sp x Sn^ 

- a function Sn Sp U {NIL} x Ar^ and 

- a partial function p : Sp x Sn (0? 1] such that p{p,n) is defined iff 

(p, n) for (p, n) e Sp X Sn and for any p ^ Sp, = 1* 

n G Sn 

We denote A = Ap U An . If S' is a finite set then we say that the probabilistic 
graph g is finite. NIL is called the terminating state. If NIL is not reachable 
from the root of g then it can be ignored. Fnnction p is called the probability 
distribution function of g. 

If (p, n) G^, we write p ^ n. If ^ (n) = (p, a) we write n ^ p. For sake 
of simplicity, instead of writing the valne of fnnction p separately, if p ^ n we 

jJj (f) 71 ^ 

write p SL n. By G we denote the set of all finite fully probabilistic graphs. 

Note: If ^ is not a function from An to Sp U {NIL} x Ar, but a subset of 
An X Ap U {NIL} X At, we get the general class of probabilistic graphs, including 
non-determinism. In this case action nodes are rather called non- deterministic 
nodes, as we do in [2,3]. 

Definition 2. Let p = (AU {7V/L},^, p, root) be a fully probabilistic graph. 
We say that g is root acyclic if there is no node n G An and a E Ar such that 
n A root. Otherwise we say that g is root cyclic. 

We define the root unwinding map p : G G as follows: 

- if g IS root acyclic, then p{g) — g; 

- if g IS root cyclic, then p{g) = {SU{NIL}U{newroot},-^^ ff ,newroot), 

where newroot is a new node, U{{newroot, n) : root n} and 

^ ^ ^ ^ { pfroot, n) if p = newroot 



Proposition 1. Let g be a fully probabilistic graph. 

i. Then p{g) is root acyclic. 

ii. g tip(p); that is, g and p[g) are strongly bisimilar. □ 



Interpretation of Constants and Operators in G 

Definition 3. (Interpretation of the constants) If a E Aj, its interpretation is 
[n] = ({^p} U {^n} U {NIL}, {^p t N I L} , pfsp , S^^) = l,Sp). 

Definition 4. (Interpretation of the operators) Let g and h be graphs in G and 

g - {Sg\J{NILg},-^g,^g,Pg,rootg) andh - {ShO{NILh},-^h,^h, LhAooth)^ 

Sequential composition: g • h is defined as: 

{Sg U A/i U {NILh},-^g U^/i,^,p,root^), 
where: — y — ( — }-g — y NILg : n E Sg, a E Act^j-}) U — yii 

U {n rooth : n E Sg, a E Actr , n N ILg} , 
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and /j{p,n) 



Hg{p,n) ifp,neSg 

Ph{p,n) ifp,neSh 



Probabilistic choice: for tt G (0, 1)^ is defined as: 

{S U //, root), 

where: S = {Sg \ {rootg}) U {Sh \ {rooth}) U {root}, root ^ Sg U Sh, 

i^g\{rootg n : n G U {-~^h \{rooth n : n G 5/^}) 

U {root ^ n : n ^ Sg, rootg ^ n}U {root ^ n : n ^ Sh, rooth ^ n}, 
— -^g U ~^h With the remark that NILg and NILh are identified 
and this node is named NIL, 

Pg {p, n) ifp,neSg\ {rootg} 

Ph{p,n) ifp,neSh\{rooth} 

TT • iig{rootg, n) tf P = root h n ^ Sg h rootg ^ n 

(1 — 7t) • phi'f^ooth, n) if p — root k n E Sh ^ rooth ^ n 

Abstraction: rj{g) for I C A is defined as: 

{Sg U {NILg} , ^g, Pg, rootg), 

where: p ^ n iff p -^g n and a ^ I and 
p ^ n iff p -%g n and a E I U {r}. 

Similarly to the non-probabilistic version of bisimulation relations including 
silent steps (in particular branching bisimulation) we allow here an observable 
action a {a r) to be simulated by a sequence of transitions such that exactly 
the last transition is an a-transition and the rest are internal transitions inside 
the same equivalence class. The new problem we should think about is the way 
we calculate the probability measure of such a sequence of transitions according 
to the probability distribution function p. For that reason we sketch (repeat) 
the standard concept used to define a probability measure (see [8,9]), adapted 
for the alternating model of fully probabilistic systems. 

Let g — {Sp U An U{NIL},-^, -E, p, root) be a finite fully probabilistic graph. 

Definition 5. For p E Sp, n E Sn^ C C Sp U {NIL} and a E Ar we define: 

- n^C iffSq EC : n^q; 

-P{p,a,C)= E P{P,n) andP{p,a,q) = P{p,a,{q}); 

n :n—^C 

- An execution fragment or finite path is a nonempty finite sequence 

(T = po ^ ^ Pi ^ P2 • • -Pk-1 Tlk-1 ^ Pk 

such that Po, ■ ■ - ,Pk & Sp LI {NIL}, no,. . Wfe-i e S„, ai, . . . , a/, ^ Ar . We 
say that a starts in pq and we write first{a) = pq, and also trace{a) = 
aia 2 . . .ah and last{a) = ph. Iflast{a) = NIL, then a is maximal. 

- If k = 0 we define P(cr) = 1, If k > 1 we define 




P((t) = p.{po,no) ■ p{pi,n2) ■ . . . ■ p{pk-i,nk-i). 
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- Let Q Sp yj {NIL} and cr be a fimte path in the form written above. If 
Pq, pi, . . . , Pk-i G Q then we say that a only passes through states in Q? 
we write cfq . 

Definition 6. An execution orfullpath is either a maximal execution fragment 
or an infinite sequence 

ai ao 

7T = po no ^ Pi ni ^ P2 . . . 

such that Po, Pi, P 2 , • • • G Sp^ no, ui, n 2 . . . G Sn^ ai , a 2 , . . . G Aj . A path is a 
finite path or a fullpath. 

Pathfuii{p) denotes the set of fullpaths starting in p. Similarly, Pathfin{p) 
{P ath f in ^ q{p) for some Q <Z SpVJ {NIL}) denotes the set of finite paths starting 
in p (that only pass through states in Q). For each process p, P induces a 
probability space on Pathfuii{p) as follows. 

Let a t denote the basic cylinder induced by (j, that is, 

(T 1= {tt e Pathfullip) ■ cr <prefix 

where <prefix is the usual prefix relation on sequences. We define crField{p) to 
be the smallest sigma- field on Pathfuii{p) which contains all basic cylinders cr f 
where cr G Pathfin{p), that is, cr ranges over all finite paths starting in p. The 
probability measure Prob on crField{p) is the unique probability measure with 
Prob{af) = P(cr). 

Lemma 1. ([9]) Let p ^ Sp and F C Pathfin{p) such that (j, G T, (j cr^ 
implies cr ^prefix Then^ Prob[F t) = XI P(^)* ^ 



Definition 7. If p G Sp, L C A* and C C Sp U {NIL}, then Prob{p, L,C) = 
Prob{F{p)f), where F{p) is the set of all finite paths cr starting in p with trace 
in L and with the last process belonging in C , that is, 

F{p) = {(7 G Pathfinip) : first(cr) = p,trace{a) G L,last{a) G C}. 

Path full, q{p) is defined as the set of fullpaths cr G Pathfuii{p) such that 
there is some > 0 and F G Pathfin,Q{p) such that F is the prefix of cr 
with the length k^- Then, in a similar way as above we define a probability 
space on Path full, q{p) ^ the probability measure Prohq^crQ = P((Jq) and 

ProhQ{p,L,C)- 



Probabilistic Branching Bisimulation The new result in our approach is a 
definition of probabilistic branching bisimulation that is weaker than the one in 
[9] and that can, we think, be extended for probabilistic processes containing non- 
determinism. The bisimulation on the set of fully probabilistic graphs we propose 
is based on the notion of a set of entries (a subset of the set of probabilistic nodes) 
and a set of exits (a subset of the set of action nodes). 

In the following, we introduce the notion of entries and exits for a given 
graph with equivalence relation R defined on the set of its nodes. An exit of a 
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probabilistic node is an action node that is the ontgoing node of an external 
action transition or an internal transition that leads to a new eqnivalence class. 
Every probabilistic node has a set of exits. Having the sets of exits determined for 
each probabilistic node in the graph we can obtain the set of entries. First, the 
root of the graph is always an entry. Fnrther, an entry in one eqnivalence class 
is a node that is first entered from an exit of some other entry by taking either 
an external or an internal action. In snch a way each entry determines the set 
of its sncceeding entries. In other words, a probabilistic node q is not an entry if 
it is reachable from entries belonging to the eqnivalence class of q only throngh 
internal paths passing throngh this eqnivalence class. Finally, for each entry 
the probabilities for reaching the eqnivalence classes of its sncceeding entries 
are compnted. All entries with the same probability distribntion are considered 
bisimilar. For nodes fonnd not to be entries the probabilities are not compnted. 
Formal definitions follow. 

Definition 8 (Entry). If g is a fully probabilistic graph and if R is an equiva- 
lence relation on the set of states then: 

Entryo{g) = {root{g)}, 

Entryi^i(g) = {q : 3r G Entryi : 3e G ExitR{r) : e q,a e Ar k q ^ [r]/^} 

U {q : 3r G Entryi - 3e G ExitR{r) : e q,a ^ A k q ^ W/?}? 
where ExitR{r) = {s : r ^ 5 & 3C 7 ^ [t]r : s A C\ a ^ Ar} 

\J {s : r =^[r]fi ' ^ s k s A [r]R, a G A}. 

Einally, EntryR{g) = |J Entryi{g). 

i>0 

r* r r* 

By we denote the transitive and reflexive closure o/^ • ^ and by =^q 
we denote the transitive and reflexive closure of {p • A p^ : p,fl E Q} for 
Q C SpU{NIL}. 



Definition 9. If g is a fully probabilistic graph, R and R are equivalence rela- 
tions on the set of states such that RC R and ifrESp, then: 

NextEntry{r) = {q : 3e G ExUrA) : e A q,a E Ar k q ^ [^]/^} 

U {q : 3e G ExUrA) : e A q,a E A k q E 
and NextEntryCpA) — {[q]r * ^ ^ N extEntryA)} ^ 

Dne to the fact that two entries from the same R eqnivalence class may have 
different sets of exits and sets of next entries, these sets have to be parametrized 
by the entry they are associated to (see Example 2). 

Definition 10 (Probabilistic Branching Bisimulation). Let g and h be 

fully probabilistic graphs. If R is an equivalence relation on SgUShE^{E ILg, NIL^} 
such that: 

0 . {root A), root A)) ^ 

1. if (p, q) E R and p ^ s then either 
1.0 (s, q) E R or 
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2 . 



LI there are v^t such that {p^v)^(s^t) G R and 

T * T T * 

q V ^ t or q ^ v ^ t; 

tf (Pj q) ^ R p s then either 

2.0 a — T and (s, q) ^ R or 

2.1 there are vR such that {q,v),{sR) G R and 



l~ Oj M / l~ \ ^ CL M 

q ^ V ^ t or q\-^ • v t; 

3. there is an equivalence relation R on EntryR{g)UEntryR{h) such that R C R 
and 



3.0, {root{g) ^root{h)) G R; 

3.1, if [p^q] G R then for any C G NextEntryCp{p) U NextEntryCp{q) 
and for any a ^ 

( p . t*,C) = Prob[q]^ {q,T*, C) and 
Probip]j,{p,T*a,C) = Prob[g]^{q,T*a,C); 



then [R, R) is a probabilistic branching bisimulation relation between g and h. 
We write g ^ pth if there is a probabilistic branching bisimulation {R^ R) be- 
tween g and h. 

Using ^pb we define relation ^prb as follows, g ^prbh if there is a proba- 
bilistic branching bisimulation [R, R) between g and h such that {root{g), root{h)} 
IS an R equivalence class and if root [g) ^ s then there is t such that root{h) ^ t 
and (sR) G R, and vice versa. We say that g and h are probabilistically rooted 
branching bisimilar. The condition above is called probabilistic rooted branching 
condition. 



From now on, instead of Prob^j^{p, r* , C) and Prob^j^{p,Ea,C) we will 
write ProbR{p, r*, C) and ProbR^p, r*a, C), respectively. (From ProbR{p, r*, C) 
it is clear that \j)]r is the subscript set in the original notation.) Even if [p]p is 
not a NextEntry class for p, we still take Prob^j^{p, r*, [p]p) = 1. 

Example 1, Let g and h be fully probabilistic graphs given in Figure 3. We de- 
hne the following equivalence relation: R = {{1, 2, 3, 4, 5}, {6, 8}, {7, 9}, {NIL}} 
Then EntryR{g U P) = {1, 3, NIL} and we dehne R = {{1, 3}, {NIL}}, 





Fig. 3. Bisimilar graphs from example 1. 
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Probabilities of these entries to the R equivalence classes are given in the fol- 
lowing table. In the table we put — in the (r, C) field if C ^ NextEntryC^{r), 
We omit the row of the NIL entry. 



r* 


{1,3} 


{NIL} 


Na 


{1,3} 


{NIL} 


Nb 


{1,3} 


{NIL} 


1 


1 


0 


1 


- 


1 — a 
1-ap 


1 


- 


1-ap 


3 


1 


0 


3 


- 


1 — a 
1 — aP 


3 


- 


1 — aP 



Thus, {R, R) is a probabilistic branching bisimulation between g and h. □ 



Example 2. The following example shows that the root condition as it is given 
is sufficient. 

Let g and h be graphs given in Figure 4. R={{1,3},{2,4,5,6},{NIL}} is 
a rooted branching bisimulation between g and h (it is the only one). But 
N ext Entry r{\) — {2} and N ext Entry — {NIL} from which we conclude 

that R cannot be defined, that is, a probabilistic branching bisimulation between 
g and h does not exist. □ 




Fig. 4. Bisimilar graphs from example 2. 



Example 3, Let g and h be graphs given in Figure 5. We define the following re- 
lation: R = {{1, 2, 4, 6, 8, 9}, {3, 5, 7, 12, 15, 18}, {10, 13, 16}, {11, 14, 17}}. Then 



r 


1 


3 


2 


4 


5 


6 


7 


ExitR{r) 


10,11 


12 


10,11 


13,14 


15 


16,17 


18 


NextEntryR{r) 


3 


2 


3 


5 


6 


7 


6 



Entrynig U h) = {1, 2, 3, 4, 5, 6, 7}. If we take Rq = {{1, 2, 4, 6}, {3, 5, 7}}, then, 
for instance, 1 and 2 do not have same probabilities, which means {R, Rq) is not 
probabilistic branching bisimulation between g and h. But if we refine classes of 
Rq into Ri in the following way: R\ — {{1, 4}, {2, 6}, {3, 5, 7}}, then it is easy to 
check that R\ satisfies the third requirement in Definition 10. We conclude that 
(R, Ri) is a probabilistic branching bisimulation between g and h. □ 

Proposition 2. Let g be a fully probabzUstzc graph. Then g ^ prbpid) ^ D 
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Fig. 5. Bisimilar graphs from example 3. 



Proposition 3. Let {R^ R) he a probabilistic branching bisimulation between g 
and h. If p, q ^ Sg U Sh cind (p, q) E R^ NextEntryCp{p) = NextEntryCp{q), 

□ 

Corollary 1. Let [R, R) be a probabilistic branching bisimulation between g and 
h. If p E EntryR{g) then there is a node q in h such that (p, q) E R. □ 

Proposition 4. If R is an equivalence relation on the fully probabilistic graph 
g and if C is an R equivalence class containing a probabilistic node then there 
IS an entry node in C, □ 

Using the results from the previous propositions, the Congruence theorem 
can be proved. 

Theorem 1. ^ prb a congruence relation on G with respect to the proba- 

bilistic choice operator, the sequential composition and the abstraction operator. 

Proof, The proof about the abstraction operator is based on the relation between 
the set of entries in the original graph and the one obtained with abstraction. 
Namely, the second set is a subset of the first set. Having this in mind, a relation 
between probabilities of the entries in the two graphs can be established. (For 
more details see [4].) For the two other operators it is not difficult to construct 
a probabilistic bisimulation relation on composed graph from already existing 
relations on the components (in that composition). Namely, for the sequential 
composition the only interesting detail is the merging the NIL equivalence class 
from the first component with the equivalence class of the root of the second 
component. Two cases occur depending on NIL being an entry or not in the 
first graph. The part concerning the probabilistic choice operator can easily be 
proved. □ 

Theorem 2 (Soundness theorem). G/ ^ prb a model of the presented 
fully probabilistic process algebra with the verification rules PVRl, PVR2,, , , , 

□ 
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3 Deciding the Branching Bisimulation Equivalence 

In this section we present an algorithm that compntes a probabilistic branching 
bisimnlation eqnivalence relation for given fully probabilistic graphs. Namely, 
the algorithm decides if the root nodes of the graphs have the same branching 
structure and, further, if they have the same probability measures. At the end it 
returns a pair of relations that relates these graphs if such relations exist. The 
basic idea of the algorithm is to start with the coarsest branching bisimnlation 
relation that relates two nodes if and only if they have the same branching struc- 
ture, regardless of their probability measures. In Definition 10 one can notice 
that probabilistic transitions in the part which concerns the branching structure 
(items 0, 1 and 2) can be viewed as internal transitions. This gives us liberty to 
employ any algorithm that decides branching bisimnlation on non-probabilistic 
systems. In particular, here we use the algorithm for deciding branching bisimu- 
lation equivalence in [10]. The original algorithm is defined on one graph in which 
case the output is the coarsest branching bisimnlation on that graph, since it 
always exists. The algorithm can slightly be modified into an algorithm that 
works on a union of two graphs (which is what we need). In this case (Stepl) 
the output is either the branching bisimnlation equivalence relation R between 
the two graphs with roots rooti and root 2 , and it is the input of the second 
part of the algorithm; or it has found that the two graphs are not branching 
bisimilar (the root nodes are not i?-related) and it returns the empty relation 
meaning that two graphs are not branching bisimilar. In the latter case the given 
graphs are not probabilistically branching bisimilar as well [Step2). Before the 
second part is run, the set of entries w.r.t. R is calculated [Step3). The second 
part of the algorithm concerns probabilities. Starting from the R equivalence 
classes restricted on the entries as the initial value for R [StepA, where BB is 
the partition induced by R) , the algorithm refines the R equivalence classes by 
comparing the probability measures for the nodes belonging to the same class 
(Stepb). If it has been found that two or more nodes from the same equivalence 
class have different probabilities, then it is split into separate subclasses. Finally, 
if it has been detected that the roots have been split apart then the algorithm 
terminates (StepQ) with the conclusion that the two graphs are not probabilis- 
tically bisimilar (returning the pair (0, 0)). Otherwise, the algorithm returns the 
pair of relations that makes graphs g and h probabilistically branching bisimilar 
{Stepl). The crucial point here is the definition of a splitter. (Note: many algo- 
rithms concerning bisimnlation are based on a notion of a splitter defined in an 
appropriate way for that particular relation.) 

Definition 11. Let g he a fully probabilistic graph and R an equivalence relation 
on g. Let R be an equivalence relation that is a subset of R. And let LI be the par- 
tition induced by R. A pair (a, C) for a ^ Aj and C ^ LI is a splitter of U if for 
some E ^ n and p,p^ G E^ if C G N extEntryCnip) or C ^ NextEntryCn{p^) 
then 



ProbR^p^ r*a, C) ProbR{p' ^ r*a, C). 
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Thus a splitter (a, C) of a partition il indicates a class in iJ such that contains 
states which prevent [R^U) from being a probabilistic branching bisimulation. 
Moreover, it indicates that partition il has to be refined to II' in such a way 
that (a, C) is not a splitter of III And thus, we split the set of entries in finer 
classes, subsets of corresponding R classes, until we obtain a partition (the R 
relation) that meets the third requirement in Definition 10. Formally, 

Definition 12. Let R and il be defined like in the previous definition and 
let (n,f7) be a splitter of II . If E ^ II we define a refinement of E w,r\t, (a, (A), 
Refine{E, n, (A), in the following way: 

Refine(E^a^C) = {En : n G N}^ 
for some set of indices N such that 

1. {En : n G N} is a partition of E and 

2. yn ^ N :ysfi ^ En : ProbR{s, r*a, C) = ProbR{t, r*a, C), 

The refinement of II w,r,t, splitter {a,C) is: 

Refine{n,a,C) = [J Refine{E,a,C), 

E^n 

The probabilities ProbR{p^Ea^C) can be computed by solving the linear 
equation system (see e.g. [8,9]) 

Xp — 1 if a = r and p ^ C 

Xp = 0 if PathfuiiMniP^^''^^^) ~ ^ 

Xp — ^ P(p, 'T, t) • + P(p, a, C) otherwise 

te\p]R 

The algorithm is given step-by-step in Figure 6. The input is given as a union 
of two graphs g and h with roots: rooti and root 2 , respectively. 



Input 


hnite fully probabilistic graphs g and h with p, rooti, root 2 ) 


Output : (R, n) probabilistic branching bisimulation between g and h if it exists 
(0,0) if g and h are not probabilistically branching bisimilar 


Method : 


Stepl 


Call the coarsest branching bisimulation relation algorithm for 
the graphs g and h, and receive R; 


Step2 


If /? = 0 then Return (0, 0); 


StepS 


Compute the sets: Entryn, NextEntryR{r); 


Step4 


n := {E n Entryn : E G BB{ \ {0}; 


Step5 


While 77 contains a splitter (a, C) do 77 := Refine{TI,a,C); 


Step6 


If rooti and root 2 are not 77-related then Return (0,0); 


Step? 


Return (77, 77). 



Fig. 6. Algorithm for computing probabilistic branching bisimulation. 



Lemma 2. The algorithm can be implemented in polynomial time in the number 
of states n. 
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Proof, Let g and h be finite fully probabilistic graphs with n states and m 
transitions (total number of states and transitions for both graphs). 

For the first part of the algorithm, finding the coarsest bisimulation relation 
we use the algorithm in [10] which has time complexity 0{n • m). In this step 
the probabilistic transitions are treated as internal transitions. The set of entries 
with respect to R can be found with a depth first search with the algorithm in 
[1] (with time complexity 

The second part of the algorithm consists of solving the system of linear 
equations and refining the current partition with respect to a found splitter. The 
test whether Pathjui,[p]R {p, T*a,C) = 0 can be done by a reachability analysis of 
the underlying directed graph. In the worst case we have to repeat the refinement 
step n times. And in each of them we have to solve a system of linear equations 
with n variables and n equations which takes 0(n^ time with the method in 
[1]. Thus we obtain the time complexity of the second part of the algorithm to 
be in the worst case 0(n^ ®). 

In total since m < • \ Ar \ we obtain 0(n^ ®) time complexity of the algo- 
rithm. □ 



4 Conclusion 

In this paper we presented a version of fully probabilistic process algebra with 
abstraction which contains, in addition to the axioms for the basic operators, a 
set of verification rules. These rules tie in successfully the idea of abstraction in 
process algebra with the results from Markov chain analysis. Furthermore, we 
proposed a probabilistic branching bisimulation relation which corresponds to 
this process algebra in the sense that it gives a model for it. In such a way we 
obtain a model for the verification rules. One of the advantages of having such 
rules is that they give the probability distribution after abstraction from the 
internal actions, which in the model (i.e. using the bisimulation relation) should 
be calculated separately. 

Due to the absence of non-determinism in the algebra we did not incorporate 
parallel composition since the model that we had used for parallel composi- 
tion in the previous work requires non-determinism. Nevertheless the process 
algebra proposed in this paper can be widely applied. Namely, in [2,3] one can 
find examples of protocols for which the protocol specification does not contain 
non-determinism. Thus, the techniques from this paper can be applied to these 
specifications for the verification part, that is, for proving that these protocols 
behave with probability 1 as a one place buffer. Moreover, the way we presented 
the definitions in the paper left room for an extension with non-determinism. 
For example, the definition of the bisimulation relation is given for the alternat- 
ing model which essentially includes non-determinism. Thus, we think that this 
work is good start for obtaining a probabilistic branching bisimulation relation 
for probabilistic processes that contain non-determinism. 
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Abstract This paper proposes a partial-order semantics for a stochas- 
tic process algebra that supports general (non-memoryless) distributions 
and combines this with an approach to numerically analyse the first pas- 
sage time of an event. Based on an adaptation of McMillan’s complete 
finite prefix approach tailored to event structures and process algebra, 
finite representations are obtained for recursive processes. The behaviour 
between two events is now captured by a partial order that is mapped 
on a stochastic task graph, a structure amenable to numerical analysis. 
Our approach is supported by the (new) tool Eorest for generating the 
complete prefix and the (existing) tool Pepp for analysing the generated 
task graph. As a case study, the delay of the first resolution in the root 
contention phase of the IEEE 1394 serial bus protocol is analysed. 



1 Introduction 

In the classical view of system design, two main activities are distinguished: 
performance evaluation and validation of correctness. Performance evaluation 
studies the performance of the system in terms like access time, waiting time 
and throughput, whereas validation concentrates on the functional behaviour of 
the system in terms of e.g. safety and liveness properties. With the advent of 
embedded and multi-media communication systems, however, insight in both the 
functional and the real-time and performance aspects of applications involved 
becomes of critical importance. The seperation of these issues does not make 
sense anymore. 

As a result, performance aspects have been integrated in various specification 
formalisms. A prominent example is stochastic process algebra in which features 
like compositionality and abstraction are exploited to facilitate the modular 
specification of performance models. Most of these formalisms, however, restrict 
delays to be governed by negative exponential distributions. The interleaving 
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semantics typically results in a mapping onto continuous-time Markov chains 
(CTMC) [1,13,19], a model for which various efficient evaluation algorithms exist 
to determine (transient and stationary) state-based measures. Although this 
approach has brought various interesting results, tools, and case studies, the 
state space explosion problem - in interleaving semantics parallelism leads to 
the product of the component state spaces - is a serious drawback. Besides that, 
the restriction to exponential distributions is often not realistic for adequately 
modelling phenomena such as traffic sources or sizes of data files stored on web- 
servers that exhibit bursty heavy-tail distributions. 

This paper proposes a partial-order semantics for a stochastic process alge- 
bra with general (continuous) distributions and combines this with techniques 
to compute the mean delay between a pair of events. The semantics is based 
on event structures [28], a well-studied partial-order model for process algebras. 
These models are less affected by the state space explosion problem as parallelism 
leads to the sum of the components state spaces rather than to their product. 
Moreover, these models are amenable to extensions with stochastic informa- 
tion [4]. A typical problem with event structures though is that recursion leads 
to infinite structures, whereas for performance analysis finite representations are 
usually of vital importance^. To overcome this problem we use McMillan’s com- 
plete finite prefix approach [26]. This technique, originally developed for 1-safe 
Petri nets and recently adapted to process algebra [24] , constructs an initial part 
of the infinite semantic object that contains all information on reachable states 
and transitions. 

In our stochastic process algebra the advance of (probabilistic) time and the 
occurrence of actions is separated. This separation of discrete and continuous 
phases is similar to that in many timed process algebras and has been recently 
proposed in the stochastic setting [16,17]. Most recent proposals for incorporat- 
ing general distributions into process algebra follow this approach [3,6]. As a 
result of this separation, interaction gets an intuitive meaning - “wait for the 
slowest process” - with a clear stochastic interpretation. Moreover, abstraction 
of actions becomes possible. We will show that due to this separation the com- 
plete finite prefix approach for process algebra [24] can be easily exploited. We 
use the prototype tool Forest to automatically generate a complete finite prefix 
from a (stochastic) process algebraic specification. 

From the finite prefixes we generate so-called stochastic task graphs, acyclic 
directed graphs where nodes represent tasks of which the delay is represented by 
a random variable and arcs denote causal dependencies between tasks. Efficient 
numerical analysis techniques exist for task graphs, and have been implemented. 
For series-parallel graphs numerical results are exact and algorithms exist to 
compute the distribution of the delay between a start and finish task. For ar- 
bitrary graphs various approximate techniques exist to compute (rather exact) 
bounds on the mean delay [21]. We use the Pepp tool suite [8,15] to analyse the 
task graphs generated from the complete prefixes. 

^ Apart from discrete-event simulation techniques and analysis techniques for regular 
structures (such as birth-death processes), that we do not consider here. 
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Most attempts to incorporate general distributions in process algebra aim 
at discrete-event simulation techniques [3,6,14]. To the best of our knowledge, 
this paper presents the first approach to analyse stochastic process algebraic 
specifications that may contain general distributions in a numerical manner. 

The applicability of our approach is illustrated by analysing the root con- 
tention phase within the IEEE 1394 serial bus protocol [20]. In particular, we 
analyse the distribution of the delay between the detection of a root contention 
and its first resolution. 

The paper is organised as follows. In Sect. 2 we present a stochastic process 
algebra with general distributions. In Sect. 3 we show how to obtain annotated 
partial orders using the Eorest tool for finite prefixes. In Sect. 4 we discuss how 
these partial orders can be seen as task graphs that can be analysed with the 
tool Pepp. In Sect. 5 we show how to combine Eorest and Pepp in order to 
perform a mean delay analysis of events after a specific state. Sect. 6 contains 
an application to the IEEE 1394 protocol, and Sect. 7 is devoted to conclusions 
and further work. An extended version of this paper can be found in [31]. 

2 A Stochastic Process Algebra 

Let Act be a set of actions, a G Act, A C Act, and F, G be general continu- 
ous probability distributions. The distinction between observable and invisible 
actions plays no role in this paper. The stochastic process algebra used here is a 
simple process algebra that contains two types of prefix processes: process a; B 
{action prefix) that is able to immediately offer action a while evolving into B, 
and (F); B {timed prefix) that evolves into process B after a delay governed by 
the continuous distribution F. That is, the probability that {F)',B evolves into 
B before t time units is F{t). In the sequel such actions are called delay actions. 
The syntax of our language is given by the following grammar: 

B ::= stop | a; F | {F);B \ B F B \ B \\a B | P 

The inaction process stop cannot do anything. The choice between B\ and B 2 
is denoted by B\ F B 2 . Parallel composition is denoted by B\ ||^ B 2 where A is 
the set of synchronizing actions; B\ W 0 B 2 is abbreviated to Bi ||| B 2 . Processes 
cannot synchronise on delay actions. The semantics of the parallel operator ||^ 
follows the semantics of the parallel operator of LOTOS [2] and thus allows 
for multi-way synchronisation. Einally, P denotes process instantiation where a 
behaviour expression is assumed to be in the context of a set of process definitions 
of the form P := B with B possibly containing process instantiations of P. In 
this paper, we assume that a process algebra expression has a finite number of 
reachable states. 

A few words on B\ F B2 are in order. B\ F B2 behaves either as B\ or B2, 
but not as both. At execution the fastest process, i.e., the process that is enabled 
first, is selected. This is known as the race condition. If this fastest process is not 
uniquely determined, a non-deterministic selection among the fastest processes 
is made. 
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Example 1. In the rest of this paper we use the following stochastic process 
algebra expression as a running example: 

Bex := [d] {G); d; stop \\a,d cl; {Fi); c; (F 2 ); d; stop) ||c b; c; stop 

3 Partial Orders, Finite Prefixes and FOREST 

In [25,26], McMillan presents an algorithm that, for a given 1-safe Petri net, con- 
structs an initial part of its occurrence net called unfolding or maximal branching 
process [9,28]. The so-called complete finite prefix of the occurrence net contains 
all information on reachable states and transitions. An important optimisation 
of the algorithm has been defined in [11]. This complete finite prefix can be used 
as the basis for model checking [10,34]. 

In [24], Langerak and Brinksma adopt the complete finite prefix approach 
for process algebra for a model similar to occurrence nets called condition event 
structures. In doing so, they have given an event structure semantics to process 
algebra. In this section, we briefly recall some definitions of [11] and [24] that are 
needed for the remainder of this paper. We show how to obtain partial orders 
from local configurations. Finally, we introduce Forest, a prototype tool which 
is based on the results of [24] . 

Conditions and Events. A process algebra expression can be decomposed 
into so-called conditions, which are action prefix expressions together with in- 
formation about the synchronisation context [29] . A condition C is defined by 

C ::= stop I a; 5 I {F);B | C\\a \ \\aC 

where 5 is a process algebra expression. Intuitively, a condition of the form 
C \\ A means that C is the left operand of a parallel operator with action set 
A. Similarly, a condition of the form \\aC means that C is the right operand 
of a parallel operator with action set C. For the construction of the complete 
finite prefix, the distinction between action prefix conditions and time prefix 
conditions plays no role; in the sequel both prefix conditions will be represented 
by the expression a; B. 

A condition event structure is a 4-tuple (C, E, M, with C a set of conditions, 
E = Eact U Edeiay R sct of cvcuts, M C C X C, the choice relation (symmetric 
and irrefiexive), and ^ C (C x E) U (E x C) the flow relation. The set Eact 
is the set of action events and Edeiay is the set of delay events. Let F be a 
set of events, then the function delay (E) returns the delay events of F, i.e. 
delay {E) = {e G F | e G Edeiay}- Condition event structures are closely related 
to Petri nets; the conditions correspond to places whereas the events correspond 
to transitions. In [24], actions and process instantiations are labelled with unique 
indices. These indices are used to create unique event identifiers. Furthermore, 
these indices are used to efficiently compute the finite prefix. For this paper, 
these indices and identifiers are not important, and therefore omitted. 
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States. A state is a tuple (5, R) with 5 C C a set of conditions, and C 5 x 5, 
an irreflexive and symmetric relation between conditions called the choice rela- 
tion: R C\x\. A state (5, R) corresponds to a ‘global state’ of the system; for 
each process in the system it stores the possible next condition(s). In fact, a 
state can always be represented by a process algebra expression. Conditions and 
their choice relations can be obtained by decomposing a process algebra expres- 
sion. The decomposition function dec, which maps a process algebra expression 
B onto a state, is recursively defined by dec{B) = (S{B)^ ^(^)) with 



dec(stop) 
dec{a; B) 
dec{Bi ||a B 2 ) 
dec{Bi -h B 2 ) 
dec(P) 



({stop}, 0) 

({a;B}, 0) 

(5(Bi)|U U \\aS{B2), R{B^)\\a U \\aR{B2)) 

{S{Bi) U ^(^ 2 ), R{Bi) U R{B2) U {S{Bi) X S{B2))) 
dec{B) iiP:= B 



In [24] it is shown how this decomposition function can be used to construct a 
derivation system for condition event transitions (i.e. the ^ relation). 

Configurations. Let (C, E, be a condition event structure. We adopt 
some Petri net terminology: a marking is a set of conditions. A node is either 
a condition or an event. The preset of a node n, denoted by »n, is defined by 
•n — {m G C U E I m ^ n}, and the postset n^ by n^ = {m G C U E | n ^ m}. 
The initial marking Mq is defined by Mq = {cgC|*c = 0}. An event e is 
enabled in a marking M if *6 C M. Let M be a marking, then we define the 
function enabled(M) as follows: enabled(M) = {e G E | *6 C M}. 

The transitive and reflexive closure of the flow relation ^ is denoted by <. 
The eonfliet relation on nodes, denoted by #, is defined as follows: let ni and 
U 2 be two different nodes, then ni # U 2 iff there are two distinct nodes mi 
and m 2 , such that mi < ni and m 2 < U 2 , with either (i) mi and m 2 are two 
conditions in the choice relation, i.e. mi x m 2 , or (ii) mi and m 2 are two events 
with •mi n«m 2 7 ^ 0 . Two nodes ni and U 2 are said to be independent^ notation 
ni X rz2, iff ^(ni < 722 ) A ^(rz2 < ni) A ^(ni M 712 ). 

Let c be a condition, then we define ex (c) to be the set of conditions in choice 
with c, i.e. Dx(c) = {c' G C | c M c'}. Similarly for a set of conditions C: m(( 7) = 
{c' G C I 3c G C : c M c'}. For an event e G E, and M and M' markings, there 
is an event transition M' iff *6 C M and M' = (M U e») \ (•e U cx (*6)). 

An event sequenee is a sequence of events ci . . . such that there are markings 
Ml, . . . , Mn with Mo^ Mi^ ... ^ Mn. 

We call Econf = {^i, . . . , e^} a eonfiguration of the condition event struc- 
ture. A configuration Econf must be conflict-free and backward closed with re- 
spect to the < relation. For an event e G E, the loeal eonfiguration [e] is defined 
by [e] = {e' G E I e' < e}. The causal ordering < restricted to E x E induces a 
partial order over a local configuration [e] (see [22,28,30]). 

A eut is a marking M which is maximal w.r.t. set inclusion and such that 
for each pair of different conditions c and c' in M the following holds: c x c' or 
c M c'. It can be shown [24] that each configuration corresponds to a cut which 
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can be uniquely associated to a state. The state corresponding to the cut of a 
configuration Econf is denoted by State {E conf) • 

Unfolding. In [24], Langerak and Brinksma present an algorithm to unfold a 
process algebra expression B into a condition event structure Unf{B). The rep- 
resentation Unf{B) may be infinite for recursive processes. In order to overcome 
this problem they adopted McMillan’s approach to compute the so-called com- 
plete finite prefix of a Petri net unfolding to the setting of condition event struc- 
tures. The finite prefix algorithm is based on a partial order relation □, called 
an adequate order. This relation is defined on finite configurations of Unf{B). 
This adequate order E is used to identify so-called cut-off events which do not 
introduce new global states. An event e is a cut-off event if Unf{B) contains a 
local configuration [eo] such that (i) State{[e]) = State{[eo]) and (ii) [eo] □ [e]. 
So, a cut-off event is an event of which the marking corresponds to a global state 
which has already been identified ‘earlier’ in the unfolding. Conceptually a finite 
prefix is obtained by taking an unfolding Unf{B) and cutting away all successor 
nodes of cut-off events. It is clear that the finite prefix depends on the ade- 
quate order □ used to compare configurations. Furthermore, the complete finite 
prefix approach only works for finite state processes, i.e. processes with a finite 
number of reachable states. In this paper we adopt the adequate order of [24]. 
The complete finite prefix corresponding with this adequate order is denoted by 
FP{B). 

Example 2. Fig. 1 shows the condition event structure of the unfolding Unf{Bex) 
of the process algebra expression Bex • As the process algebra expression Bex does 
not contain process recursion, the unfolding Unf{Bex) is already finite by itself. 
Conditions are represented by circles and events are depicted by squares. The 
initial marking Mq is represented by the three conditions at the top of Fig. 1. The 
local configuration of event c is [c] = {a, (Fi),6, c}. The state of configuration 
[c] is formally represented by State{[c]) = ({(G); d; stop ||a,d||c, ||a,d (^ 2 ); d; stop||c, 
lie stop}, 0 ). The process algebra expression corresponding with State{[c\) is 
((G); d; stop \\a,d (^ 2 ); d; stop) ||c stop. The local configuration of event d is [d] = 
{a, 6, (G), (Fi), c, (F 2 ), d}. The state of configuration [d] is represented by the 
three leaf conditions. The partial order of the events within [d] is induced by 
the flow relation ^ which is depicted by the arrows between the conditions and 
events. 



FOREST. Forest^ [31] is a prototype tool that is based on the unfolding and 
finite prefix algorithms of [24] . Given a process algebra expression B (with a finite 
number of reachable states). Forest computes the corresponding complete finite 
prefix FP{B) as a condition event structure. The tool allows to use McMillan’s 
original adequate ordering or the adequate ordering defined in [24]. FOREST is 
used as a prototype tool to experiment with several aspects of the unfolding 
algorithm, like alternative adequate orderings, independence algorithms, cut-off 



2 



Forest stands for “a tool for event structures”. 
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Fig. 1. Condition event structure of the unfolding Unf (Bex)- 



criteria, etc. For these experimental purposes, Forest can either export the 
finite prefix FP{B) to a textual representation or to a format suitable as input 
for graph drawing tools like vcg [32] or dot [12]. Future additions to Forest will 
include an interactive visual simulator and a model checking module. Forest 
has been implemented in C++ (4000 lines of code) and the development took 
roughly eight man months. 

This section has only briefly addressed the construction of the complete finite 
prefix FP{B) of a process algebra expression B. For the remainder of this paper, 
the most important aspect of the unfolding algorithm is that the construction of 
the condition event structure induces a partial order on the events of a (local) 
configuration. FOREST can be used to compute such partial orders. 

4 Task Graph Analysis and PEPP 

The tool Pepp^ has been developed at the University of Erlangen [8,15] in 
the early nineties of the previous century. The tool has a broad functional- 
ity, amongst which program instrumentation, monitoring (using the hardware 

^ Pepp stands for “Performance Evaluation of Parallel Programs” . 
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(a) (b) 

Fig. 2. Task graphs of (a) the local configuration of event d of the running 
example Bex and (b) the local configuration of event rooti of the root contention 
protocol (discussed in Sect. 6). 

monitor ZM4) and trace analysis. In this paper, we only use the following func- 
tionality of Pepp: (i) the creation of task graphs and (ii) the automatic analysis 
of task graphs. 

Task graphs consist of nodes connected by directed edges. Nodes, which rep- 
resent tasks to be executed, can be of several types (e.g. hierarchical, cyclic and 
parallel nodes) but here we will only use so-called elementary nodes that model 
activities taking a certain amount of time, i.e. delay actions. The time that an 
activity or task takes is governed by a continuous distribution function. The 
dependency between tasks is modelled by the directed edges between the nodes. 

Pepp supports several built-in distribution functions, like deterministic, ex- 
ponential, approximate, and mixed Erlang distributions, the parameters of which 
can be chosen by the user. It is also possible to use general distributions in a 
numerical form, that can either be created by the user or by the additional tool 
Capp^ [27]. A numerical representation of a distribution is given by a text file 
containing the offset of the density function, the step size and the density values 
for each step. Capp also allows the graphical representation of distribution and 
density functions. Nodes can be created interactively by the user via a graphical 
interface. Nodes can be connected by edges representing causal dependencies. 
These dependencies are required to be acyclic and the resulting graph is called 
a task graph. In fact a task graph can be seen as a partial ordering of nodes. 

Example 3. Suppose we are interested in the running time of event d of Bex^ 
starting from the initial state. If we only consider the delay actions and the 
causal dependencies of Unf{Bex) of Fig. 1, we obtain the task graph of Fig. 2 (a). 



Analysis of task graphs. After a task graph has been input to Pepp, the 
run time distribution of the model can be analysed in several ways. The most 
attractive mode of analysis is via SPASS^. In order to analyse a task graph 

^ Capp stands for “Calculation and Presentation Package” . 

^ SPASS stands for “Series Parallel Structure Solver”. 
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using SPASS it has to be in series- parallel reducible form. This means it can be 
reduced to a single node by successively applying two reduction steps: 

— Series reduetion: in this reduction step a sequence of nodes is reduced to a 
single node. 

— Parallel reduetion: in this reduction step several parallel nodes with the same 
predecessors and successors are reduced to a single node. 

SPASS analysis is an exact type of analysis; the SPASS reductions preserve the 
performance analysis aspects. A series-parallel task graph is reduced by SPASS 
to a single node with a distribution that represents the first passage time of the 
complete task graph. This distribution function is calculated in a numerical way 
and can be visualised (together with its corresponding density function) using 
Capp. 

If a task graph is not series-parallel reducible it can be analysed using sev- 
eral well-known approximate (bounding) methods [15,21]. The basic idea behind 
these approximations is that nodes are added or deleted until the task graph be- 
comes series-parallel reducible. This leads to upper and lower bounds of the first 
passage time of the task graph. Pepp offers several of these bounding techniques. 

It is also possible to approximate the analysis by transforming the task graph 
into an interleaving transition system, and approximating the distributions by 
deterministic and exponential distributions. This approximation suffers heavily 
from state space explosion problems and does not exploit the advantages of the 
partial order properties; for these reasons this analysis method has not been used 
in this paper. 

Pepp is a powerful analysis tool for stochastic task graphs. For simple ex- 
amples these task graphs can easily be created in a manual way. For realistic 
system designs, though, developing task graphs in a direct way becomes more and 
more cumbersome and error prone. An effective solution to this problem, as hrst 
recognised by Herzog in [18], is to automatically generate task graphs in a com- 
positional manner from a stochastic process algebra specification. While [18] re- 
veals some problems in using task graphs as a semantic model for (non-recursive) 
stochastic process algebras, our approach - that is aimed at recursive processes - 
is to generate task graphs from a finite event structure semantics. 

5 First Passage Time Analysis 

In Sect. 3, we discussed how a partial order of events of a local configuration [e] 
can be obtained from a finite prefix FP{B) of an unfolding. In this section we 
discuss how the partial orders generated by FOREST can be used for first passage 
time analysis with Pepp. 

Algorithm 1 constructs a task graph of the local configuration of an event 
e starting from the initial state of a process algebra expression. With Pepp we 
can compute the first passage time of event e to occur (starting from the initial 
state). If the partial order of [e] happens to be series-parallel reducible, Pepp 
will even compute the distribution function of the runtime. 
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Algorithm 1. Construct the task graph for [e] starting from the initial state of B. 

1. Specify the target event e within the process algebra expression B. 

2. Use Forest to compute the finite prefix FP(B) until event e has occurred or the 
complete finite prefix has been generated. If the prefix does not contain e (which 
means that e is not reachable), then stop; apparently the problem was not well- 
defined. 

3. Consider the local configuration [e]; together with the causal ordering < this in- 
duces a partial order P. 

4. Project P onto the delay events. This yields a task graph that can be used as input 
to Pepp. 



But the analysis is not restricted to starting from the initial state. We can 
also supply a set of independent events {ei,...,6n} as a starting point, and 
ask for the passage time for an event e to occur after these events. There is 
however a constraint involved here: if after the events {ei, . . . , e^} a delay ac- 
tion is enabled, then this delay action has to be causally dependent on at least 
one event in {ei,...,en}. In other words, the following should hold: Vcgn C 
enabled U . . . U [Ct^,])) • (^en C ^delay C “[ei, • • • 5 — ^en* 

Otherwise there is no way to determine when such a delay event Cen may have 
started. Algorithm 2 shows how to apply Pepp when the start state is deter- 
mined by a set of independent events within the finite prefix PP{B). Of course, 
the target event e has to be causally dependent on the events in {ei, . . . , 6^}. 
If this is not the case. Algorithm 1 - started in step 5 of Algorithm 2 - will 
unsuccessfully terminate in step 2. 

Note that the partial orders obtained by both algorithms are only useful for 
Pepp if the configuration between the initial event (s) and the target event e 
contains at least one delay event. 

Example Consider Fig. 1 which corresponds to the condition event structure 
of Unf{Bex)‘ Suppose we are interested in the runtime of event d. The sets {a} 
and {a, b} can both be used as input for Algorithm 2; in fact, for this example 
they will all yield the task graph of Fig. 2 (a). The singleton set {b} can also 
be used as a starting point for Algorithm 2, as the delay events {G) and (Fi) 
are not enabled in State{[b]); only the event a is enabled in State{[b]). Again, 
the task graph of Fig. 2 (a) will be computed. The singleton set {c}, however, 
cannot be used as a valid input for Algorithm 2 as the delay event (G), which is 
enabled in State{[c\), does not depend on event c. The set {a, c} cannot be used 
as input for Algorithm 2 either, because the events a and c are not independent: 
a < c. 

6 The Root Contention Phase in IEEE 1394 

This section discusses a small case study where we applied our approach to com- 
pute the mean passage time of the first resolution of the root contention phase 
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Algorithm 2. Construct the task graph for [e] starting from {ei, . . . , Cn}- 

1. Specify the target event e within the process algebra expression B. 

2. Use Forest to compute the finite prefix FP{B) until the events {ei, . . . , Cn} have 
all occurred or the complete finite prefix has been generated. If FP{B) does not 
contain all events of {ei, . . . , Cn}, then stop; apparently the problem was not well- 
defined. 

3. If there are conflicts among the events in {ei, . . . , Cn} then stop, as apparently the 
problem is not well-defined; otherwise continue. 

4. Calculate S = State{[ei] U . . . U [cn]). 

5. Check if all enabled delay actions which causally depend on S are dependent on 
at least one event from {ei,...,en}. If not, stop; apparently the problem is not 
well-defined. Otherwise, apply Algorithm 1 with S as initial state and compute the 
partial order of the local configuration of target event e. 



of the IEEE 1394 protocol [20]. Due to space limitations only the Eorest and 
Pepp models of the root contention phase are discussed here. A more thorough 
discussion can be found in [31]. 

FOREST. Eig. 3 presents the specification of the root contention protocol in 
our process algebra. The model itself is based on [33]. The two Node^ processes 
are connected to each other by two Wire^ processes, that represent the commu- 
nication lines between the components. Each Node^ process has a Buf^ process 
which can hold a single message from the other Node(i_^). New messages from 
Node(i_^) will simply overwrite older messages. Both nodes start (via ProCi) to 
wait gi(t) units of time. If after waiting, the buffer is still empty (i.e. check .emp^), 
the node will sent a send-req^ to its partner and will subsequently wait for an 
acknowledgement. If this acknowledgement (i.e. check.acki) arrives, Node^ will 
declare itself a child using action childi. On the other hand, if after waiting gi(t) 
units of time, Node^ receives a check -req^ action, it declares itself to be the leader 
using action rooti. The delay of the communication line is modelled by the delay 
action (Ei). 

The basic idea behind the protocol is that if the waiting times gi (t) of the two 
nodes are different, the ‘slowest’ node will become root. Since with probability 
one the outcomes of the waiting times gi(t) will eventually be different, the root 
contention protocol will terminate with probability one [33] . 

Apart from the performance analysis of the protocol that we report on in this 
paper, the specification of Eig. 3 may readily be used for a functional analysis 
of the protocol. The condition event structure generated by Eorest for this 
process algebraic expression contains 57 events (of which 8 are cut off-events) 
and 210 conditions. 

To illustrate both algorithms of Sect. 5 we have identified a start state in 
the process algebra expression of Eig. 3 (i.e. corresponding with the events 
{ei, . . . , 6n}) from which we want to compute the first passage time to another 
state (i.e. target event e). The start state is defined by the first occurrence of the 
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The root contention protocol is modelled by the following stochastic process algebra 
expression: 



(Nodeo III Nodei) Wciob (Wireo ||| Wirei) 



with the following (process) definitions {i G {0, 1}) : 



Nodei 

Proci 

SndAcki 

SndReq- 

Bufi 

BufReq^ 

BufAcki 

Wirei 

WireReq^ 

WireAcki 

Glob 

Loc 



(ProCi IIloc Bufi) 

(Gi); {check -emp SndReq^ + check-reqp, SndAcki) 
send-acki] rooti] stop 

send-Tcqp^ {check -reqp^ ProCi + check -acki] childi] stop) 
check-cmpf, Bufi + recv-reqp^ BufReq^ + recv-acki] BufAcki 
check-reqp, Buf i recv .req f, BufReq ■ recv-acki; BufAcki 
check-acki] Bufi + recv-reqp^ BufReq^ + recv-acki] BufAcki 
send-Tcqp^ WireReq^ + send-acki] WireAcki 
(Fi); recv.reqf^^_-y, Wirei + Wirei 
(Fi); recv -ack(^i_iY^ Wirei + Wirei 
{send-req^^ send-req^^ send-acko, send-acki, 
recv.reqQ, recv.reqi, recv.acko, recv.acki} 

{check.emp^^ check.req-, check.acki} 



Fig. 3. Process algebra expression of the root contention protocol. 



following actions: send-req^^ recv-req^^ send-req^ and recv-req^. That is, just be- 
fore both the check -req^ and check -req^ actions are about to happen. In the root 
contention protocol, this corresponds to the situation in which both processes 
are about to receive the parent request of their contender, which will initiate a 
new contention resolution phase. In the graph representation of the correspond- 
ing condition event structure, this set of starting events can easily be identified, 
due to the flow relation ^ and the induced causal order <. The complete FP{B) 
is omitted due to its size, though. 

From these four events, we are interested in the delay until the first occurrence 
of the event corresponding with action rooti, that is, the first resolution after 
the contention, which declares Nodei to be the root. Fig. 4 shows the partial 
order of the events leading to this rooti event. It is generated using the graph 
drawing tool dot [12]. Note that the events check. req^ and check. req^ are indeed 
the first events that can occur. Within Forest, distribution events all have a 
del. prefix. 

PEPP. For the runtime analysis with Pepp^, only the delay events of the 
partial order are of interest. Fig. 2 (b) shows the task graph as used by Pepp 
containing only the delay events and the elementary start and end events; it is 
the projection of Fig. 4 on the delay events. For the events (Gq) and {Gi) we 
have used the same uniform distribution function G{t), that is used in [5] for the 



For our experiments we used version 3.3 of Pepp (released in July 1993) [7]. 
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Fig. 4. Partial order of the events leading to the rooti event. 

transient analysis (via simulation) of the root contention protocol. A graphical 
representation of the distribution function G{t) is given in Fig. 5. The unit of 
the Figures 5 and 6 is fisec as Pepp requires fixed step-size. The parameters 
L), N and I are parameters of the density function g{t) which is defined as tuple 
g = {D, N, go, . . . ^ gi), where D is the displacement between 0 and go and N is 
the order of the distribution [15]. For the delay event (Fq) we used an uniform 
distribution function F{t) between and ^§3, assuming that the transmission 
speed of the lines is 19Sm/ gsec. 

We used Pepp to analyse the task graph corresponding with the local config- 
uration of the rooti event. Fig. 6 shows a graphical representation of the density 
function of the first passage time together with the average time (0.51 gsec) of 
the partial order leading to rooti. Note that the results obtained only relate to 
the time of the root contention when contention is resolved on the first attempt 
of the protocol. It does not provide information on the transient behaviour of 
the protocol. 

7 Conclusions 

In this paper we discussed a partial-order semantics for a stochastic process al- 
gebra that supports general (non-memoryless) distributions and combined this 
with an approach to numerically analyse the mean delay between two events. 
Based on an adaption of McMillan’s complete finite prefix approach tailored 
to event structures and process algebra, we used Forest to obtain finite rep- 
resentations for recursive processes. The behaviour between two events is now 
captured by a partial order of events that can be mapped on a stochastic task 
graph. We used Pepp for numerical analysis of such task graphs. 
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Gdelay: 24. 28. 57. 60 
mean value = 41.7475 van 

D = 4800 N = 0 I = 7200 


= 281.104 
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Fig. 5. Distribution function G{t). 



Result: 51.0892 

mean value = 51.0892 van = 206.302 
D = 5048 N = 3 1 = 901 




Fig. 6. Density function of the run- 
time of the configuration leading to 
the root I event. 



The paper presents a novel application of McMillan’s finite prefix algorithm. 
Furthermore, the work can be seen as a successor of [4] in the sense that it shows 
the practical feasibility of the use of event structure semantics for stochastic 
analysis. 

A clear advantage of our approach is that we are able to reason about both the 
functional and non-functional aspects of systems using the same model and nota- 
tion. Furthermore, as our approach uses general distributions, hence, it can still 
be used when approximations through exponential distributions are no longer 
realistic. 

We foresee three different uses of our approach for performance modelling. 
First, it is possible that the first passage time between two events is simply what 
one is interested in, and then our approach yields the answer. Secondly, our 
approach might play an auxiliary role in establishing the right parameters for 
performance models of other types that can then be further analysed. Thirdly, 
our approach might be the first step in a more evolved numerical calculation 
exploiting more features of Pepp. 

In the current setting we are able to compute the first passage time of a single 
event. Our next step will be to try to adopt our approach to the combined run- 
times of conflicting events and repetitive events. In [23], Langerak has shown how 
to derive a graph rewriting system from the complete finite prefix of a condition 
event structure. We are currently studying the use of a graph rewriting system 
as the basis for transient analysis with Pepp, using its more advanced node 
types like cyclic nodes, hierarchical nodes and probabilistic choice. This would 
make it possible to compare our work with discrete-event simulation approaches 
like 0 [5,6]. Furthermore, we are currently working on an (user) interface to 
integrate Forest and Pepp. 
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Abstract. In previous work we have developed and prototyped a silicon 
compiler which translates a functional language (SAFL) into hardware. 
Here we present a SAFL-level program transformation which: (?) parti- 
tions a specihcation into hardware and software parts and generates 
a specialised architecture to execute the software part. The architecture 
consists of a number of interconnected heterogeneous processors. Our 
method allows a large design space to be explored by systematically 
transforming a single SAFL specihcation to investigate different points 
on the area- time spectrum. 



1 Introduction 

In [12] v^e introduced a hardv^are description language, SAFL (Statically Allo- 
cated Functional Language), and sketched its translation to hardv^are. An opti- 
mising silicon compiler for SAFL targetting hierarchical RTL Verilog has been 
implemented [18] and tested on a number of designs, including a small commer- 
cial processor^. SAFL is a hrst-order functional language with an ML [10] style 
syntax. We argue the case for functional languages over (say) Occam on the 
grounds of easier and more powerful program analysis, transformation and ma- 
nipulation techniques. The essential features of SAFL can briehy be summarised 
as follows: 

— programs are a sequence of function dehnitions; 

— functions can call other dehned functions but recursive calls must be tail- 
recursive^. (Section 2.5 addresses the exact technical restrictions.) 

^ The instruction set of Cambridge Consultants XAP processor was implemented (see 
www.camcon.co.uk). We did not include the SIF instruction. 

^ Section 2.6.3 shows how this restriction can be removed by mapping general recursive 
functions into software. 



T. Margaria and W. Yi (Eds.): TACAS 2001, LNCS 2031, pp. 236-251, 2001. 
(c) Springer- Verlag Berlin Heidelberg 2001 
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This allows our SAFL silicon compiler to: 

— compile SAFL in a re source- aw are manner. That is we map each function 
definition into a single hardware-level resource] functions which are called 
more than once become shared resources^ . 

— synthesise highly parallel hardware — referential transparency allows one to 
evaluate all subexpressions in parallel. 

— statically allocate the storage (e.g. registers and memories) required by a 
SAFL program. 

The SAFL language is designed to facilitate source-to-source transformation. 
Whereas traditional “black-box” synthesis systems synthesise hardware accord- 
ing to user-supplied constraints, our approach is to select a particular implemen- 
tation by applying transformation rules to the SAFL source as a pre-compilation 
phase. We have shown that applying fold/unfold transformations [4] to SAFL 
specifications allows one to explore various time-area tradeoffs at the hardware 
level [12,13]. The purpose of this paper is to demonstrate how hardware/software 
partitioning can be seen as a source-to-source transformation at the SAFL level 
thus providing a formal framework in which to investigate hardware/software co- 
design. In fact we go one step further than traditional co-design since as well as 
partitioning a specification into hardware and software parts our transformation 
procedure can also synthesise an architecture tailored specifically for executing 
the software part. This architecture consists of any number of interconnected 
heterogeneous processors. There are a number of advantages to our approach: 

— Synthesising an architecture specifically to execute a known piece of software 
can offer significant advantages over a fixed architecture [17]. 

— The ability to synthesise multiple processors allows a wide range of area- 
time tradeoffs to be explored. Not only does hardware/software partitioning 
affect the area-time position of the final design, but the number of proces- 
sors synthesised to execute the software part is also significant: increasing 
the number of processors pushes the area up whilst potentially reducing 
execution time (as the processors can operate in parallel). 

— Resource- awareness allows a SAFL specification to represent shared resources. 
This increases the power of our partitioning transformation since, for exam- 
ple, multiple processors can access the same hardware resource (see Figure 1 
for an example). 

1.1 A Brief Overview of the SAFL Language 

SAFL is a language of first order recurrence equations; a user program consists 
of a sequence of function definitions: 

f un /i (£) = ei ; . . . ; fun /„ (£) = e„ 

^ All sharing issues are handled automatically by our sihcon compiler: arbiters are 

inserted where necessary to protect shared resources and data-validity analysis is 
performed facilitating the generation of efficient inter-resource interface logic [18]. 
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Programs have a distinguished function, main, (usually fn) which represents an 
external world interface — at the hardware level it accepts values on an input 
port and may later produce a value on an output port. The abstract syntax of 
SAFL expressions, e, is as follows (we abbreviate tuples (ei,...,e/^) as e and 
similarly {xi , . . . , xi^) as x ) : 

— variables: x] constants: c; 

— user function calls: /(e); 

— primitive function calls: a(e) — where a ranges over primitive operators (e.g. 
+, -, <=, && etc.); 

— conditionals: ei ? 62 : 63 ; and 

— let bindings: let x = e in eo end 

See Figures 3 and 4 for concrete examples of SAFL code. 



1.2 Comparison with Other Work 

Previous work on compiling declarative specifications to hardware has centred 
on how functional languages themselves can be used as tools to aid the design 
of circuits. SheeranA et al. muFP [19] and Lava [2] systems use functional pro- 
gramming techniques (such as higher order functions) to express concisely the 
repeating structures that often appear in hardware circuits. In this framework, 
using different interpretations of primitive functions corresponds to various op- 
erations including behavioural simulation and netlist generation. Our approach 
takes SAFL constructs (rather than gates) as primitive. Although this restricts 
the class of circuits we can describe to those which satisfy certain high-level prop- 
erties, it permits high-level analysis and optimisation yielding efficient hardware. 
(A more detailed comparison of SAFL with other hardware description languages 
including Verilog, VHDL, ELLA and Lustre can be found in [13]). 

Hardware/software co-design is well-studied and many tools have been built 
to aid the partitioning process [3,6,1]. Although these systems differ in their 
approach to co-design they are similar in so far as partitioning is a “black- 
box” phase performed as part of the synthesis process. By making partition- 
ing visible at the source-level we believe our approach to be more flexible — 
hardware/software co-design is just one of a library of source-to-source transfor- 
mations which can be applied incrementally to explore a wide range of architec- 
tural trade-offs. 

The idea of converting a program into a parameterised processor and corre- 
sponding instruction memory is not new; Page described a similar transforma- 
tion [17] within the framework of Handel [16] (a subset of Occam for which a 
silicon compiler was written). Whereas Page A transformation allowed a designer 
to synthesise a single parameterised processor, our method allows one to gen- 
erate a much more general architecture consisting of multiple communicating 
processors accessing a set of (potentially shared) hardware resources. 

The impact of source-to-source transformation has been investigated in the 
context of imperative hardware description languages [20,14]. We argue that 
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program transformation is a more powerful technique in the SAFL domain for 
two reasons: 



— The functional properties of SAFL allow equational reasoning and hence 
make a wide range of transformations applicable (as we do not have to worry 
about side effects). 

— The resource- aware properties of SAFL give fold/unfold transformations pre- 
cise meaning at the design-level (e.g. we know that duplicating a function 
definition in the source is guaranteed to duplicate the corresponding resource 
in the generated circuit). 



2 Technical Details 



The first step in the partitioning transformation is to define a partitioning func- 
tion, 7T, specifying which SAFL functions are to be implemented directly in 
hardware and which are to be mapped to a processor for software execution. 
Automated partitioning is not the subject of this paper; we assume that tt is 
supplied by the user. For expository purposes we initially describe a transforma- 
tion where all processors are variants of a stack machine: Section 2.1 describes 
the operation of the stack machine and Section 2.2 shows how it can be en- 
coded as a SAFL function; a compiler from SAFL to stack code is presented 
in Section 2.3. In Section 2.6 we generalise our partitioning transformation to a 
network of heterogenous processors. 



SAFL SAFL Final Design 




Partitioning: (a) shows the call-graph of a SAFL specihcation, P; (b) shows the 
call-graph of ^(P), where tt = {(/, Mi), (h, Mi), («, M 2 ), (/, M 2 )}. IM \ and IM 2 
are instruction memory functions (see Section 2.2); (c) shows the structure of the 
hnal circuit after compilation. The box marked ‘A’ represents an arbiter (inserted 
automatically by the SAFL compiler) protecting shared resource A;; the bold arrows 
represent calls, the dotted arrows represent return values. 



Fig. 1. A diagrammatic view of the partitioning transformation 
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Let J\4 be the set of processor instances nsed in the final design. We define a 
(partial) partitioning fnnction 

7T : SAFL function name A4 

mapping the fnnction definitions in onr SAFL specification onto processors in Ad. 
7t(/) is the processor on which fnnction / is to be implemented. If / ^ Dom{7r) 
then we realise / in hardware, otherwise we say that / is located on machine 
7t(/). Note that mnltiple fnnctions can be mapped to the same processor. 

We extend tt to a transformation fnnction 

7T : SAFL Program SAFL Program 

such that given a SAFL program, P, 7t{P) is another SAFL program which 
respects the partitioning function tt. Figure 1 shows the effect of a partitioning 
transformation, tt, where 



M = {Ml, M 2 }; and 

^={(/,Mi), (h,Mi), (i,M2), (i,M2)j 

In this example we see that g and k are implemented in hardware since g^k ^ 
Dom{7r). 7t(P) contains function definitions: Mi, M 2 , IMi, IM 2 , g and k where Mi 
and M 2 are processor instances and IMi and IM 2 are instruction memories (see 
Section 2.2). 

2.1 The Stack Machine Template 

Our stack machine can be seen as a cut-down version of both LandinA SECD 
machine [9] and CardelliA Functional Abstract Machine [5]. Each instruction has 
an op-code field and an operand field n. The following instructions are defined: 



PushC(n) 


push constant n onto the stack 


PushV(n) 


push variable (from offset n into the current stack) 


PushA(n) 


push the value of the stack machine A argument a^^ (see Sec- 
tion 2.2) to the stack 


Squeeze(n) 


pop top value; pop next n values; re-push top value 


Return(n) 


pop result; pop link; pop n arguments; re-push result; branch 
to link 


Call_Int(n) 


push address of next instruction onto stack and branch to ad- 
dress n 


Jz(n) 


pop a value; if it is zero branch to address n 


Jmp(n) 


jump to address n 


Alu2(n) 


pop two values; do 2-operand builtin operation n on them and 
push the result 


Halt 


terminate the stack machine returning the value on top of the 
stack 
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We define a family of instructions to allow the stack machine to call external 
functions: 



Call_Extj^ 



pop each of /’s arguments from the stack; invoke the external 
function / and push the result to the top of the stack. 



The stack machine template^ SMT, is an abstract model of the stack machine 
parameterised on the code it will have to execute. Given a stack machine pro- 
gram, s, (i.e. a list of stack machine instructions as outlined above) SMT{s) is a 
stack machine instance: a SAFL function encoding a stack machine specialised 
for executing s. Our notion of a template is similar to a VHDL generic. 



2.2 Stack Machine Instances 

A stack machine instance, SM* G M, is a SAFL function of the form: 

funSM^Cai, PC, SP) = ... 

where = max{{arity{f) \ 7t(/) = SM^}) 

Arguments PC and SP are used to store the program counter and stack pointer re- 
spectively; ai , . . . , a^^^ are used to receive arguments of functions located on SM^ . 
Each stack machine instance is associated with an instruction memory function, 
IM* of the form: 

fun IM* (address) = 

case address of 0 => instruction_0 
I 1 => instruction.! 

. . . etc . 

SM* calls IM*(PC) to load instructions for execution. 

For example, consider a stack machine instance, SM/ /j, where we choose to 
locate functions / (of arity 2) and h (of arity 3). Then nj^h =3 yielding signa- 
ture: SM/ /j(ai, a 2 , as, PC, SP). IM^ /j is an instruction memory containing compiled 
code for / and h. To compute the value of h[x, y, z) we invoke SM/ with argu- 
ments ai = a?, a 2 = y, as = z, PC = ext _h entry external entry point — see 

Section 2.3) and SP = 0. Similarly to compute the value of f{x,y) we invoke 
with arguments ai = a?, a 2 = as = 0, PC = ext-f and SP = 0. Note 
how we pad the a-arguments with O’s since arity{f) < 3. 

The co-design of hardware and software means that instructions and ALU 
operations are only added to SM^ if they appear in IM^. Parameterising the stack 
machine template in this way can considerably reduce the area of the final design 
since we remove redundant logic in each processor instance. 

We can consider many other areas of parameterisation. For example we can 
adjust the op-code width and assign op-codes to minimise instruction-decoding 
delay [17]. Figure 4 gives the SAFL code for a 16-bit stack machine instance^. 
An alu2 function, and an example stack machine program which computes tri- 
angular numbers is shown in Figure 3. 

^ Approximately 2000 2-input equivalent gates when compiled using the SAFL sili- 
con compiler. For simplicity we consider a simple stack machine with no Call .Ext 
instructions. 
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2.3 Compilation to Stack Code 

Figure 2 gives a compilation function from SAFL to stack based code. Although 
the translation of many SAFL constructs is self-explanatory, the compilation 
rules for function definition and function call require further explanation: 



Compiling Function Definitions 

The code generated for function definition 

fun f{xi,. ..,Xk) = e 

requires explanation in that we create 2 distinct entry points for /: f entry and 
ext -f entry The internal entry point, f entry, is used when / is invoked internally 
(i.e. with a Call_Int instruction). The external entry point, ext -f entry, is used 
when / is invoked externally (i.e. via a call to 7t(/), the machine on which / 
is implemented). In this latter case, we simply execute k PushA instructions to 
push /A arguments onto the stack before jumping to /A internal entry point, 

f entry- 



Compiling Function Calls 

Suppose function g is in software {g G Dom{7v)) and calls function /. The code 
generated for the call depends on the location of / relative to g. There are three 
possibilities: 

1. If / and g are both implemented in software on the same machine (/ G 
Dom{7i) A 7t(/) = 7t(^)) then we simply push each of /A arguments to the 
stack and branch to /A internal entry point with a Call_Int instruction. 
The Call_Int instruction pushes the return address and jumps to f entry', 
the compiled code for / is responsible for popping the arguments and link 
leaving the return value on the top of the stack. 

2. If / is implemented in hardware (/ ^ Dom{7T)) then we push each of /A 
arguments to the stack and invoke the hardware resource corresponding to 
/ by means of a Call_Ext/ instruction. The Call_Ext/ instruction pops each 
of /A arguments, invokes resource / and pushes /A return value to the stack. 

3. If / and g are both implemented in software but on different machines (/, g G 
Dom{7i) A 7t(/) ^ 7t(^)) then g needs to invoke 7 t(/) (the machine on which 
/ is located). We push 7 t(/)A arguments to the stack: the arguments for / 
possibly padded by Os (see Section 2.2) followed by the program counter PC 
initialised to extj^^^^y and the stack pointer SP initialised to 0. We then 
invoke 7t(/) using a Call_Ext7r(/) instruction. 




Hardware/ Software Co-design Using Functional Languages 243 



Let cr, be an environment mapping variable names to stack offsets (offset 0 signifies 
the top of the stack). Let g be the name of the function we are compiling. Then 
[[ • gives an instruction list corresponding to g. (We omit g for readability in 
the following — it is only used to identify whether a called function is located on 
the same machine). 

We use the notation <j{x i— )■ n} to represent environment <j extended with x map- 
ping to n. cr'^^ represents an environment constructed by incrementing all stack 
offsets in cr by n — i.e. cr'^^(x) = a(x) -h n. 0 is the empty environment. The inhx 
operator @ appends instruction lists. Repeat{l, n) is / @ ... @ / (n times); (this is 
used to generate instruction sequences to pad argument lists with Os). 



[[c](T = [PushC(c)] 

[[:r](T [PushV((j(:r))] 

( @ @ ... @ @ [Call_Ext/] 

if / ^ Dom{7r) 



[/(ei, . ..,ek 



I def 

cr = 



@ [[e2]^T+^ @ ... @ 

@ [CallHnt (/entry)] 

< if / G Dom{7r) A 7t(/) = 7r{g) 



{eija @ [[e2]^T+^ @ @ 

@ /?epeott([PushC(0)], arity{7r{f)) — 2 — k) 

@ [PushC(ed;t_/^^^^y), PushC(O), Call_Ext^(;)] 
if / G Dom{7r) A 7t(/) / 7r{g) 



[a(ei,e2)]a = 
[let X = ei in e2]cr 
[ei ? 62 : esjcr '^= 



[eijcr @ [e 2 ]cr+^ @ [Alu2(a)] 

[eijcr @ [e 2 ]cr'^^{d; i-A 0} @ [Squeeze(l)] 

let I and V be new labels in 

[ei](j @ [Jz (/)] @ [e 2 ](J @ [Jmp (F), label: 1] 
@ [esjcr @ [/ot6e/; V] 



[fun g{xi, . . . , Xk) = e] = [/ct6e/; gentry] @ [e]^0{:rt, i-A 1, Xk-i 2,. . . ,xi k} 

@ [Return(/c)] 

^ [label: ext PushA(l), ..., PushA(A), 
Call_Int(5rentry), Halt] 



Fig. 2. Compiling SAFL into Stack Code for Execution on a Stack Machine 
Instance 
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2.4 The Partitioning Transformation 

Having introduced the stack machine (Section 2.1) and the associated compila- 
tion function (Section 2.3) the details of the partitioning transformation, tt, are 
as follows: 

Let P be the SAFL program we wish to transform using tt. Let / be a SAFL 
function in P with definition dy of the form 

fun /(xi, . . .,Xk) = e 

We construct a partitioned program 7 t(P) from P as follows: 

1. For each function definition dj G P to be mapped to hardware (i.e. / ^ 
Dom{7r j) create a variant in 7 t(P) which is as but for each call, g[ei , . . . , e^) 

If ^ G Dom{7r) then replace the call ^(e) with a call: 

m(ei, . . .,6^:, 0, . , 0, ext.g^^^^y.O) 

artty{m) — 2 — k 



where m = Tr(^), the stack machine instance on which g is located. 

2. For each m G Al: 

(a) Compile instruction sequences for functions located on m: 

Codcm = {P/1 I 7t(/) = m} 

(b) Generate machine code for m, MCodcm, by resolving symbols in Codcm, 
assigning opcodes and converting into binary representation. 

(c) Generate an instruction memory for m by adding a function definition, 

to 7 t(P) of the form: 

fun (address) = 

case address of 0 => instruction_0 
I 1 => instruction.! 

. . . etc . 

where each instruct ion J is taken from MCodcm- 

(d) Generate a stack machine instance, SMT{ Codcm) and append it to ^{P ) . 

For each m G Al, ^{P) contains a corresponding processor instance and 
instruction memory function. When 7 t(P) is compiled to hardware resource- 
awareness ensures that each processor definition function becomes a single pro- 
cessor and each instruction memory function becomes a single instruction mem- 
ory. The remaining functions in 7 t(P) are mapped to hardware resources as 
required. Function calls are synthesised into optimised communication paths be- 
tween the hardware resources (see Figure Ic). 
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2.5 Validity of Partitioning Functions 

This section concerns some fine technical details — it can be skipped on first 
reading. 

We clarify the SAFL restriction on recnrsion^ given in the Introdnction as 
follows. 

In order for a SAFL program to be valid^ all recursive calls, including those 
calls which form part of mutually-recursive cycle, may only occur in tail-context, 
Non-recursive calls may appear freely. 

This allows storage for SAFL variables to be allocated statically as tail re- 
cnrsion does not reqnire the dynamic allocation of stack frames. 

Unfortnnately, in general, a partitioning fnnction, tt, may transform a valid 
SAFL program, P, into an invalid SAFL program, 7t{P), which does not satisfy 
the recursion restrictions. For example consider the following program, Ptad- 

fun f(x) = x+1; 
fun g(x) = f(x)+2; 
fun h(x) = g(x+3) ; 

Partitioning Pbad with tt = {(/, SM), (h,SM)} yields a new program, 7:{Pbad)^ of 
the form: 

fun IM(PC) = . . . 

fun SM(x,PC,SP) = . . . let t = <top-of-stack> 

in g(t) . . . 

fun g(x) = SM(x, <ext_f _entry> , 0) + 2; 

^{Pbad) has invalid recursion between g and SM. The problem is that the call to 
SM in the body of g is part of a mutually-recursive cycle and is not in tail-context. 

We therefore require a restriction on partitions tt to ensure that if P is a valid 
SAFL program then 7 t(P) will also be a valid SAFL program. For the purposes 
of this paper we give the following sufficient condition: 

7T IS a valid partition with respect to SAFL program, P, ijf all cycles occurring 
the call graph of n{P) already exist in the call graph of P , with the exception of 
self-cycles generated by direct tail-recursion. 

Thus, in particular, new functions in 7 t(P) — i.e. stack machines and their in- 
structions memories — must not have mutual recursion with any other functions. 

2.6 Extensions 

Fine Grained Partitioning We have presented a program transformation 
to map function definitions to hardware or software, but what if we want to 
map part of a function definition to hardware and the rest to software? This 
can be achieved by applying fold/unfold transformations before our partitioning 
transformation. For example, consider the function 

^ A more formal presentation can be found in [12]. 
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f(x,y) = if x=0 then y 

else f(x-l, x+y - 7 + 5+x) 

If we choose to map f to software our design will contain a processor and associ- 
ated machine code consisting of a sequence of instructions representing multiply 
X and y, subtract 7, add 5 times x. However, consider transforming f with a 
single application of the fold-vule [4] : 

i(x,y) = x+y-7 + 5+x 

f(x,y) = if x=0 then y else f(x-l, i(x,y)) 

Now mapping f to software and i to hardware leads to a software representation 
for f containing fewer instructions and a specialised processor with a x+y-7 + 
5+x instruction. 



Dealing with Heterogeneous Processors So far we have only considered 
executing software on a network of stack machines. Although the stack machine 
is a familiar choice for expository purposes, in a real design one would often 
prefer to use different architectures. For example, specialised VLIW [8] archi- 
tectures are a typical choice for data-dominated embedded systems since many 
operations can be performed in parallel without the overhead of dynamic instruc- 
tion scheduling. The commercial “Art Designer” tool [1] partitions a C program 
into hardware and software by constructing a single specialised VLIW processor 
and compiling code for it. In general, however, designs often consist of multiple 
communicating processors chosen to reflect various cost and performance con- 
straints. Our framework can be extended to handle a network of heterogeneous 
processors as follows: 

Let Templates be a set of processor templates (c.f. the stack machine tem- 
plate, AMT, in section 2.1). 

Let Compilers be a set of compilers from SAFL to machine code for processor 
templates. 

As part of the transformation process, the user now specifies two extra func- 
tions: 



S : J\A ^ Templates 
T : A4 ^ Compilers 

S maps each processor instance, m G onto a SAFL processor template and 
r maps each m ^ A4 onto an associated compiler. We then modify the trans- 
formation procedure described in Section 2.4 to generate a partitioned program, 
as follows: for each m ^ A4 we generate machine code, MCodem, us- 
ing compiler r(m); we then use processor template, MT — S[m)^ to generate 
processor instance MT{MCodem) and append this to 7rj T-(T). 



Extending the SAFL Language Recall that the SAFL language specifies 
that all recursive calls must be in tail-context. Since only tail-recursive calls are 
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permitted, our silicon compiler is able to statically allocate all the storage needed 
for a SAFL program. 

As an example of these restrictions consider the following definitions of the 
factorial function: 

rfact(x) = if x=0 then 1 else x+rfact(x-l) 
ifact(x,a) = if x=0 then a else ifact(x-l,x+a) 

rfact is not a valid SAFL program since the recursive call is not in a tail- 
context. However the equivalent tail-recursive factorial function, if act which 
uses a second argument to accumulate partial results is a valid SAFL program. 

Although one can sometimes transform a non-tail recursive program into an 
equivalent tail-recursive one, this is not always easy or natural. The transforma- 
tion of factorial into its tail-recursive equivalent is only possible because multipli- 
cation is an associative operator. Thus, in general we require a way of extending 
SAFL to handle general unrestricted recursion. Our partitioning transformation 
provides us with one way to do this: 

Consider a new language, SAFL-h constructed by removing the recursion 
restrictions from SAFL. We can use our partitioning transformation to transform 
SAFL-h to SAFL simply by ensuring that each function definition containing 
recursion other than in a tail-call context is mapped to software. Note that 
our compilation function (Figure 2) is already capable of dealing with general 
recursion without any modification. 

3 Conclusions and Further Work 

Source-level program transformation of a high level HDL is a powerful technique 
for exploring a wide range of architectural tradeoffs from an initial specification. 
The partitioning transformation outlined here is applicable to any hardware de- 
scription language (e.g. VHDL or Verilog) given suitable compilation functions 
and associated processor templates. However, we believe that equational rea- 
soning makes program transformation a particularly powerful technique in the 
SAFL domain. 

We are in the process of deploying the techniques outlined here as part of 
a semi-automated transformation system for SAFL programs. The goal of the 
project is to develop a framework in which a SAFL program can be system- 
atically transformed to investigate a large number of possible implementations 
of a single specification. So far we have developed a library of transformations 
which allow us to represent a wide range of concepts in hardware design in- 
cluding: resource sharing/duplication, static/dynamic scheduling [13] and now 
hardware/software partitioning. In the future we plan to investigate how partial 
evaluation techniques [7] can be used to transform a processor definition func- 
tion and its corresponding instruction memory function into a single unit with 
hardwired control. 

Although initial results have been promising, the project is still in its early 
stages. We are currently investigating ways of extending the SAFL language to 
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make it more expressive without loosing too many of its mathematical proper- 
ties. Our current ideas centre around adding synchronous communication and a 
restricted form of 7r-calculus [11] style channel passing. We believe that this will 
allow us to capture the semantics of I/O whilst maintaining the correspondence 
between high-level function definitions and hardware-level resources. 
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(* + + 

I SAFL specification of simple stack processor I 

I Richard Sharp and Alan Mycrof t , July 2000 I 

+ + *) 



(* *) 

fun alu2(op:16, al:16, a2:16):16 = 
case op of 0 => al+a2 
! 1 => al-a2 
! 2 => al&&a2 
! 3 => all |a2 
! 4 => al''''a2 
! 16 => al<a2 
! 17 => al>a2 
! 18 => al=a2 
! 19 => al>=a2 
! 20 => al<=a2 
! 21 => alOa2 

(* Instruction memory here *) 

(* The following codes: f(x) = if x then x+f(x-l) else 0; *) 

(* i.e. it computes triangular numbers *) 

fun load_instruction (address : 16) : 24 = case address of 





0 => 


•/.ooooiooioooooooooooooooi 


(* 


pusha 1 


*) 




1 => 


•/.oooooioiooooooooooooooii 


(* 


call_int f 


*) 




2 => 


•/.oooooooooooooooooooooooo 


(* 


halt 


*) 




3 => 


•/.OOOOOOlOOOOOOOOOOOOOOOOl 


(* f: 


pushv 1 


*) 




4 => 


•/.OOOOOlllOOOOOOOOOOOOllOO 


(* 


jz 11 


*) 




5 => 


•/.OOOOOOlOOOOOOOOOOOOOOOOl 


(* 


pushv 1 


*) 




6 => 


•/.OOOOOOlOOOOOOOOOOOOOOOlO 


(* 


pushv 2 


*) 




7 => 


•/.OOOOOOOlOOOOOOOOOOOOOOOl 


(* 


pushc 1 


*) 




8 => 


•/.OOOOlOOOOOOOOOOOOOOOOOOl 


(* 


alu2 sub 


*) 




9 => 


•/.OOOOOIOIOOOOOOOOOOOOOOII 


(* 


call_int f 


*) 




10=> 


•/.ooooiooooooooooooooooooo 


(* 


alu2 add 


*) 




11=> 


•/.OOOOOl 100000000000001101 


(* 


jmp 12 


*) 




12=> 


•/.oooooooioooooooooooooooo 


(* 11: 


pushc 0 


*) 




13=> 


•/.oooooioooooooooooooooool 


(* 12: 


return 1 


*) 


default => 


•/.loioioioioioioioioioiolo 


(* 


illop 


*) 



external mem_acc (address : 16 , data: 16 , write : 1) : 16 



inline fun data_read (address : 16) : 16 = mem_acc (address , 0 ,0) 

inline fun data_write (address : 16 , data: 16) : 16 = mem_acc (address , data, 1) 



Fig. 3. The Stack Machine (Part 1 of 2) 
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(+ Stack Machine Instance +) 

fun SMachine (al:16, PC: 16, SP:16):16 = 



let var 


new_PC 


16 


= PC + 1 


var 


instr 


24 


= load_instruct ionCPC 


var 


op_code 


8 


= instr[23,16] 


var 


op _ rand 


16 


= instr[15,0] 


var 


inc_SP 


16 


= SP + 1 


var 


dec_SP 


16 


= SP - 1 


in 

case 


op_code 

0 => C + 


of 

halt , 


, returning TOS +) 



data_read(SP) 



!!=>(+ push constant operation +) 

data.write (dec_SP , op.rand) ; 

SMachine (al , new_PC, dec_SP) 

! 2 => C+ push variable operation +) 

let var data: 16 = data_read(SP+op_rand) 
in data.write (dec_SP, data); 

SMachine (al , new_PC, dec_SP) end 
! 9 => C+ push a-argument operation +) 
data.write (dec_SP , al); 

SMachine (al , new_PC, dec_SP) 

! 3 => C+ squeeze operation -- op.rand is how many locals to pop +) 
let var new_SP:16 = SP + op.rand 
var v:16 = data.read (SP) 
in data.write (new_SP, v) ; 

SMachine (al , new_PC, new_SP) end 

! 4 => C+ return operation -- op.rand is how many actuals to pop +) 
let var new_SP:16 = inc_SP + op.rand 
var rv:16 = data_read(SP) 
in let var rl:16 = data_read(inc_SP) 
in data.write (new_SP, rv) ; 

SMachine (al , rl , new_SP) end end 
! 5 => C+ call.int operation +) 

data.write (dec_SP , new_PC) ; 

SMachine (al , op.rand, dec_SP) 

! 6 => C+ jmp Cabs) operation +) 

SMachine (al , op.rand, SP) 

! 7 => C+ jz Cabs) operation +) 

let var v:16 = data.read CSP) 

in SMachine Cal > if v=0 then op_rand else new_PC, inc_SP) end 
! 8 => C+ alu2 : binary alu operation -- specified by immediate field +) 
let var v2:16 = data.readCSP) 
in let var vl:16 = data_readCinc_SP) 

in data_write C inc_SP , alu2Cop_rand, vl , v2)); 

SMachine Cal, new_PC, inc_SP) end end 

default => 

C+ halt, returning Oxffff -- illegal opcode +) 

Zllllllllllllllll 

end 



Fig. 4. The Stack Machine (Part 2 of 2) 
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Abstract. A system of conservative transformation rules is presented for abstracting 
memories whose forwarding logic interacts with stalling conditions for preserving 
the memory semantics in microprocessors with in-order execution. Microprocessor 
correctness is expressed in the logic of Equality with Uninterpreted Eunctions and 
Memories (EUEM) [6]. Memory reads and writes are abstracted as arbitrary uninter- 
preted functions in such a way that the forwarding property of the memory seman- 
tics — that a read returns the data most recently written to an equal write address — is 
satisfied completely only when exactly the same pair of one read and one write 
address is compared for equality in the stalling logic. These transformations are 
applied entirely automatically by a tool for formal verification of microprocessors, 
based on EUEM, the Burch and Dill flushing technique [6], and the properties of 
Positive Equality [3]. An order of magnitude reduction is achieved in the number of 
eij Boolean variables [9] that encode the equality comparisons of register identifiers 
in the correctness formulas for single-issue pipelined and dual-issue superscalar 
microprocessors with multicycle functional units, exceptions, and branch prediction. 
That results in up to 40x reduction in the CPU time for the formal verification of the 
dual-issue superscalar microprocessors. 



1 Introduction 

The motivation for this work is the complexity of the formal verification of correct 
microprocessors. The formal verification is done with the Burch and Dill flushing tech- 
nique [6] by exploiting the properties of Positive Equality [3] in order to translate the 
correctness formula from the logic of Equality with Uninterpreted Functions and 
Memories (EUEM) to propositional logic. The translation is done by a completely 
automatic tool [16] [21]. The resulting Boolean formula can be evaluated with either 
BDDs [2] or Boolean Satisfiability (SAT) checkers for being a tautology, which 
implies that the original EUEM correctness formula is universally valid, i.e., the pro- 
cessor is correct under all possible conditions. 

Recently we showed that errors in complex realistic microprocessors are detected 
in CPU time that is up to orders of magnitude smaller than the time to prove the cor- 
rectness of a bug-free version of the same design [20]. The present paper aims to speed 
up the verification of correct microprocessors with multicycle functional units, excep- 
tions, and branch prediction, where reads and writes of user- visible state are not reor- 
dered and occur according to their program sequence. 
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2 Background 

In this work, the logic of EUFM [6] is used for the definition of high-level models of 
both the implementation and the specification microprocessors. The syntax of EUFM 
includes terms and formulas. Terms are used in order to abstract word-level values of 
data, register identifiers, memory addresses, as well as the entire states of memories. 
A term can be an Uninterpreted Function (UF) applied on a list of argument terms, a 
domain variable, or an ITE operator selecting between two argument terms based on a 
controlling formula, such that ITEiformula, terrui, term^ will evaluate to ternii when 
formula - true and to term2 when formula - false. Formulas are used in order to 
model the control path of a microprocessor, as well as to express the correctness condi- 
tion. A formula can be an Uninterpreted Predicate (UP) applied on a list of argument 
terms, a propositional variable, an ITE operator selecting between two argument for- 
mulas based on a controlling formula, or an equation (equality comparison) of two 
terms. Formulas can be negated and connected by Boolean connectives. 

UFs and UPs are used to abstract away the implementation details of functional 
units by replacing them with “black boxes” that satisfy no particular properties other 
than that of functional consistency — the same combinations of values to the inputs of 
the UF (or UP) produce the same output value. Three possible ways to impose the 
property of functional consistency of UFs and UPs are Ackermann constraints [1], 
nested ITEs [3] [16], and “pushing-to-the-leaves” [16]. In the nested ITEs scheme, the 
first application of some \J¥,f(ai, bf), is replaced by a new domain variable A sec- 
ond application, /((22, Z?2), is replaced by lTE(fa 2 = a{) a (Z?2 = b{), C2), where C 2 is a 

new domain variable. A third one, /(<33, bf), is replaced by ITElfa^ = af) a (b^ = bf). 
Cl, ITElfa^ = 02 ) A (Z?3 = Z?2), C 2 , C3)), where C3 is a new domain variable, and so on. 

The syntax for terms can be extended to model memories by means of the func- 
tions read and write, where read takes 2 argument terms serving as memory and 
address, respectively, while write takes 3 argument terms serving as memory, address, 
and data. Both functions return a term. Also, they can be viewed as a special class of 
(partially interpreted) uninterpreted functions in that they are defined to satisfy the for- 
warding property of the memory semantics, namely that read(write(mem, aw, d), ar) - 
ITE{ar - aw, d, readfnem, ar)), in addition to the property of functional consistency. 
Versions of read and write that extend the syntax for formulas can be defined similarly, 
such that the version of read will return a formula and the version of write will take a 
formula as its third argument. Both terms and formulas are called expressions. 

The correctness criterion is a commutative diagram [6] . It requires that one step of 
the Implementation transition function followed by flushing should produce equal 
user- visible state as first flushing the Implementation and then using the resulting user- 
visible state to apply the Specification transition function between 0 and k times, where 
k is the issue-width of the Implementation. Elushing of the processor is done by feed- 
ing it with bubbles until all instructions in flight complete their execution, computing 
an abstraction function that maps Implementation states to a Specification state. (The 
difference between a bubble and a nop is that a bubble does not modify any user- visi- 
ble state, while a nop increments the PC.) The correctness criterion is expressed by an 
EUFM formula of the form: 

^ninfi v OTi j a W 2 j ... a j v ... v a ... a , (1) 
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where n is the number of user- visible state elements in the implementation processor, k 
is the maximum number of instructions that the processor can fetch in a clock cycle, 
and m^-j, 1 < / < /z, 0 < 7 < ^, is an EUFM formula expressing the condition that user- 
visible state element i is updated by the first j instructions from the ones fetched in a 
single clock cycle. (See the electronic version of [16] for a detailed discussion.) The 
EUFM formulas j, m 2 j, 0 < 7 < ^, are conjuncted in order to ensure that the 

user- visible state elements are updated in “sync” by the same number of instructions. 
The correctness criterion expresses a safety property that the processor completes 
between 0 and k of the newly fetched k instructions. 

Positive Equality allows the identification of two types of terms in the structure of 
an EUFM formula — those which appear only in positive equations and are called 
p-terms, and those which can appear in both positive and negative equations and are 
called g -terms (for general terms). A positive equation is never negated (or appears 
under an even number of negations) and is not part of the controlling formula for an 
ITE operator. A negative equation appears under an odd number of negations or as part 
of the controlling formula for an ITE operator. The computational efficiency from 
exploiting Positive Equality is due to a theorem which states that the truth of an EUFM 
formula under a maximally diverse interpretation of the p-terms implies the truth of the 
formula under any interpretation. The classification of p-terms vs. g-terms is done 
before UFs and UPs are eliminated by nested /TEs, such that if an UF is classified as a 
p-term (g-term), the new domain variables generated for its elimination are also con- 
sidered to be p-terms (g-terms). After the UFs and the UPs are eliminated, a maximally 
diverse interpretation is one where: the equality comparison of two syntactically iden- 
tical (i.e., exactly the same) domain variables evaluates to true; the equality compari- 
son of a p-term domain variable with a syntactically distinct domain variable evaluates 
to false; and the equality comparison of a g-term domain variable with a syntactically 
distinct g-term domain variable could evaluate to either true or false and can be 
encoded with a dedicated Boolean variable — an variable [9]. 

In order to fully exploit the benefits of Positive Equality, the designer of a high- 
level processor must use a set of suitable abstractions and conservative approxima- 
tions. For example, an equality comparison of two data operands, as used to determine 
the condition to take a branch-on-equal instruction, must be abstracted with an UP in 
both the Implementation and the Specification, so that the data operand terms will not 
appear in negated equations but only as arguments to UPs and UFs and hence will be 
classified as p-terms. Similarly, a Finite State Machine (FSM) model of a memory has 
to be employed for abstracting the Data Memory in order for the addresses, which are 
produced by the AFU and also serve as data operands, to be classified as p-terms. In 
the FSM abstraction of a memory, the present memory state is a term that is stored in a 
latch. Reads are modeled with an UF that depends on the present memory state and 
the address, while producing a term for the read data. Writes are modeled with an UF 
that depends on the present memory state, the address, and a data term, producing a 
term for the new memory state, which is to be stored in the latch. The result is that data 
values produced by the Register File, the AFU, and the Data Memory can be classified 
as p-terms, while only the register identifiers, whose equations control forwarding and 
stalling conditions that can be negated, are classified as g-terms. 
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We will refer to a transformation on the implementation and specification proces- 
sors as a conservative approximation if it omits some properties, making the new pro- 
cessor models more general than the original ones. Note that the same transformation 
is applied to both the implementation and the specification processors. However, if the 
more general model of the implementation is verified against the more general model 
of the specification, so would be the original implementation against the original spec- 
ification, whose additional properties were not necessary for the verification. 

Proposition 1. The FSM model of a memory, based on uninterpreted functions and 
/p is a conservative approximation of a memory. 

Proof If a processor is proved correct with the FSM model of a memory where the 
update function and the read function/^ are completely arbitrary uninterpreted func- 
tions that do not satisfy the forwarding property of the memory semantics, then the 
processor will be correct for any implementation of f^ and/p including/^ = write and 
fy. = read. □ 



3 Automatic Abstraction of Memories 

When abstracting memories in [19], the following transformations were applied auto- 
matically by the verification tool when processing the EUFM correctness formula, 
starting from the leaves of that formula: 

read(m, a) fr(^, a) (2) 

write(m, a, d) ^ fjpi^ (^) 

ITE(e A {ra = wa), d,ffm, ra)) fj{ITE{e,fJjn, wa, d), m), ra) (4) 

Transformations (2) and (3) are the same as those used in the abstraction of the Data 
Memory, described in Sect. 2. Transformation (4) occurs in the cases when one level of 
forwarding logic is used to update the data read from address ra of the previous state m 
for the memory, where function read is already abstracted with UF /^. Accounting for 
the forwarding property of the memory semantics that was satisfied before function 
read was abstracted with UF /^, the left handside of (4) is equivalent to read(ITE(e, 
write(m, wa, d), m), ra), i.e., to a read from address ra of the state of memory m after a 
write to address wa with data d is done under the condition that formula e is true. On 
the right handside of (4), functions read and write are again abstracted with/^ and/^^ 
after accounting for the forwarding property. Multiple levels of forwarding are 
abstracted by recursive applications of (4), starting from the leaves of the correctness 
formula. Uninterpreted functions and/^ can be automatically made unique for every 
memory, where a memory is identified by a unique domain variable serving as the 
memory argument at the leaves of a memory state term. Hence, and/^ will no longer 
be functionally consistent across memories — a conservative approximation. 

After all memories were automatically abstracted as presented above, the tool in 
[19] checked if an address term for an abstracted memory was still used in a negated 
equation, i.e., was a g-term. If so, then the abstraction for that memory was undone. 
Hence, abstraction was performed automatically only for a memory whose addresses 
are p-terms outside the memory and the forwarding logic for it. From Proposition 1, it 
follows that such an abstraction is a conservative approximation. The condition that 
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address terms of abstracted memories are used only as p-terms outside the abstracted 
memories avoids false negatives that might result when a (negated) equation of two 
address terms will imply that a write to one of the addresses will (not) affect a read 
from the other address in the equation when that read is performed later — a property 
that is lost in the abstraction with UFs. In the architecture verified in [19], transforma- 
tions (2) - (4) worked for the Branch- Address Register File, whose forwarding logic 
did not interact with stalling logic for preserving the correctness of the memory 
semantics for that register file, as the branch-address results were available for for- 
warding right after the Execute stage. However, the above abstractions were not appli- 
cable to the Integer and Floating-Point Register Files that did have stalling logic 
interact with their forwarding logic. 

The contribution made with this paper is the idea of a hybrid memory model, 
where the forwarding property of the memory semantics is satisfied fully for only 
those levels of forwarding where exactly the same pair of one read and one write 
address is compared for equality outside the abstracted memory, i.e., in the EUFM for- 
mula resulting after the application of transformations (2) - (4). We will refer to 
addresses compared in general equations outside an abstracted memory as control 
addresses, and will call those general equations control equations. In the cases when 
the read address in a level of forwarding is a control address, but the write address is 
not or the write address is also a control address but does not appear in a control equa- 
tion together with the read address, then that level of forwarding is abstracted with 
uninterpreted function (where “wJ” stands for “update data”). takes 4 argu- 

ment terms — write address wa, write data wd, read address ra, and data rd read from 
the previous memory state before the write — such that the functionality abstracted 
sNiih f^J^wa, wd, ra, rd) is lTE{ra = wa, wd, rd). Finally, when the read address is not a 
control address, the read is abstracted as before based on transformations (2) - (4). 

Note that the initial state of the pipeline latches in the implementation processor 
consists of a domain variable for every term signal (including register identifiers) and a 
Boolean variable for every Boolean signal. Hence, the equality comparisons of register 
identifiers done in the stalling logic during the single cycle of regular symbolic simula- 
tion along the implementation side of the commutative correctness diagram will be at 
the level of domain variables serving as register identifiers. Furthermore, using Burch’s 
controlled flushing [7], it is possible to flush the implementation processor without 
introducing additional register id equality comparisons due to the stalling logic. In 
controlled flushing, instructions are artificially stalled by overriding the processor stall 
signals with user-controlled auxiliary inputs until it is guaranteed that the stall signals 
will evaluate to false, i.e., all the data operand values can be provided correctly by the 
forwarding logic or can be read directly from a register file. Note that we can modify 
the processor logic during flushing, as all that logic does then is to complete the par- 
tially executed instructions. Mistakes in such modifications can only result in false 
negatives. The symbolic simulation of the non-pipelined specification processor along 
the specification side of the commutative correctness diagram does not result in addi- 
tional control equations, as all data operands are read directly from the register files. 
Therefore, using controlled flushing and applying transformations (2) - (4) will result 
in an EUFM correctness formula with only those general (control) equations over reg- 
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ister identifier terms that are introduced by the stalling logic in the single cycle of reg- 
ular symbolic simulation in order to preserve the correctness of the memory semantics 
for a register file. The exact abstraction steps are presented next. 

Algorithm for Applying the Hybrid Memory Model: 

1. Abstract all memories: 

1.1 use rules (2) - (4) to abstract memories extended with forwarding logic; 

1.2 use UF to abstract levels of forwarding where the initial data is not read 
from a memory: 

ITE(e A {ra = wa), Jq) ^ ITE(e,f^^(wa, ra, Jq), Jq) (5) 

where Jq is neither an application of UF nor an ITE expression that has an 
application of/^ among its leaves; 

1.3 identify the control equations and control addresses. 

2. For all applications of UF/^ whose address term is an ITE expression, push/^ to the 
leaves of the address term: 

fyim, ITE(e, ^ 2 )) ^ ITE(e,fj.(m, ra^),//m, ^ 2 )) (6) 

until every address argument of becomes a domain variable. If an address argu- 
ment tofj. is an application of an UF, then use the nested ITEs scheme to eliminate it 
and again apply (6) recursively. 

3. For those applications //m, ra), where the address term ra is a control address and 
m is of the form lTE{e,f^{m^, wa, wd), rriQ), do: 

3.1 if the write address wa is an ITE expression, ITE(c, wai, wa^), where some of 
the leaf terms are control addresses compared for equality to ra in a control 
equation, then apply the transformation: 

fj.{ITE{e,fJjnQ, ITE(c, wai, wa^), wd), mg), ra) 

ITE{c,fj.{ITE{e,f^^{mQ, wd), mg), ra), 

f^iITE{e,f^{mQ, wa 2 , wd), m^), ra)) (7) 

3.2 else, if the write address wa is a control address compared for equality to ra in a 
control equation, then apply the transformation: 

f^{ITE{e,fJjnQ, wa, wd), mg), ra) ^ ITE(e a {ra = wa), wd,fy.{m^, ra)) (8) 

3.3 else, apply the transformation: 
f^{ITE{e,f^{mQ, wa, wd), mg), ra) 

ITE(e,f^^(wa, wd, ra,fjjn^, ra)),fj.{m^, ra)) (9) 

until every memory argument of becomes a domain variable, i.e., is the initial 
state of a memory. 

Note that Step 3 relies on the assumption that every write to a memory is done under 
the condition that an enabling formula e is true. However, an unconditional write is the 
case when e = true. The soundness of the above transformations is proved as follows. 
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Proposition 2. The hybrid memory model, based on uninterpreted functions f^, fy. and 
is a conservative approximation of a memory. 

Proof If a processor is proved correct with arbitrary uninterpreted functions and 
fyi^, the processor will be correct for any implementation of these uninterpreted func- 
tions, including wa, wd) = write(m, wa, wd), ffm, ra) = read(m, ra), and/j^j(w< 2 , 
wd, ra, rd) = lTE{ra - wa, wd, rd) that satisfy the complete memory semantics. □ 

Transformations (2) - (9) separate the effects of the forwarding and stalling logic, 
modeling conservatively their interaction. The result is a dramatic reduction in the 
number of Boolean variables required for encoding the equality comparisons of 
g-term domain variables, as now most of the register identifier terms are used only as 
inputs to uninterpreted functions, i.e., become p-terms. Indeed, only transformation (8) 
introduces a general equation over register identifiers. However, exactly the same 
equation already exists in the EUFM formula outside the abstracted memory, so that 
the number of general equations over register identifiers is equal to the number of 
equations generated by the stalling logic in the single cycle of regular symbolic simu- 
lation of the implementation processor. Hence, the translation of the EUFM correct- 
ness formula to propositional logic by applying transformations (2) - (9) will depend 
on significantly fewer Boolean variables and its tautology checking will be done much 
faster, compared to the case when these transformations are not applied. 

In our previous work [16] [17] [19], the two final memory states reached along the 
two sides of the commutative diagram were checked for equality by generating a new 
domain variable, performing a read from that address of both final states, and compar- 
ing for equality the two resulting data terms. That scheme introduces additional 
Boolean variables encoding the equality of the final read address with each of the write 
addresses. The advantage of that comparison method is that it can account for the 
property of transitivity of equality for the register ids [4] [5]. Although that property is 
not required for the correct benchmarks used in this paper, it is needed in order to 
avoid false negatives for buggy versions, as well as when verifying out-of-order super- 
scalar processors [20]. In the present paper, the final memory states are compared for 
equality by applying transformation (3) to the final memory state terms, i.e., abstract- 
ing function write with UF /^, and directly comparing for equality the resulting final 
memory state terms. Indeed, what is verified is that the same sequence of updates is 
performed under all conditions by both sides of the commutative diagram. 

Transformations (2) - (9) are based on the assumption that reads and writes are 
not reordered and occur in the same sequence along the two sides of the commutative 
correctness diagram — the case in microprocessors with in-order execution. 

4 Example 

The transformation rules will be illustrated on the pipelined processor in Fig. 1. It can 
execute only register-register instructions and has 4 stages: Instruction Fetch (IF), Exe- 
cute (EX), a Dummy stage (D), and Write-Back (WB). Since there is no forwarding 
logic to bypass the result of the instruction in D to the instruction in EX, a newly- 
fetched instruction is stalled if it has a data dependency on the preceding instruction in 
EX. When set to true, signal Flush stops the instruction fetching and inserts bubbles in 
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the pipeline, thus flushing it by allowing partially executed instructions to complete 
[6]. The instruction memory, IMem, is read-only and is abstracted with 3 UFs — one 
for each of the instruction fields source register (SrcReg), op-code (Op), and destina- 
tion register (DestReg) — and 1 UP — for the valid bit (Valid). UFs ALU and +4 
abstract, respectively, the ALU in EX and the PC incrementer in IF. The register file, 
RegFile, is write-before-read, i.e., the newly-fetched instruction will be able to read 
the data written by the instruction in WB. The user- visible state elements are PC and 
RegFile. The processor is compared against a non-pipelined specification (not shown) 
that consists of the same user- visible state, UFs, and UP. It does not have the 3 pipeline 
latches, stalling logic, or forwarding logic. 



WB_Valid 




Fig. 1. Block diagram of a 4-stage pipelined processor. 



Flushing the pipeline takes 3 clock cycles, as there can be up to 3 instructions in 
flight. In order to define the specification behavior, the implementation processor is 
first flushed, reaching the initial specification state {PC_SpecQ, RegFile _Spec^, as 
shown below. The specification processor is then exercised for 1 step from that state, 
reaching specification state (PC_Speci, RegFile _Spec^. SrcReg, DestReg, and Op are 
new domain variables and Valid is a new Boolean variable used to eliminate the single 
applications of, respectively, the three UFs and one UP that abstract the IMem. 
Domain variables PC and RegFile represent the initial state of the corresponding state 
element. Domain and Boolean variables with prefixes IF_EX_, EX_D_, and D_WB_ 
represent the initial state of the corresponding pipeline latch. Functions read and write 
are already abstracted with UFs/^ and/^^, respectively, according to (2) and (3). The 
left arrow means assignment. 



RegFileQ 
RegFile I 
forward^ 
ALU_DataQ 
Result^ 



^ ITE{D_WB_ValidJJ^RegFile, D_WB_DestReg, D_WB_Result), RegFile) 
^ ITE{EX_D_Valid,f ^{RegFileQ, EX_D_DestReg, EX_D_Result), RegFile^ 
^ {IF_EX_SrcReg = D_WB_DestReg) a D_WB_Valid 
<r- ITEiforwardQ, D_WB_Result, IF_EX_Data) 

^ ALU{IF_EX_Op, ALU_Datao) 
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RegFile_SpecQ ^ ITE{IF_EX_Valid,f^{RegFilei, IF_EX_DestReg, Resulto), RegFUei) 
PC_SpecQ ^ PC 

DatUQ ^ fj.(Reg File _SpecQ, SrcReg) 

Result I ALU{Op, Data^ 

RegFile_Speci <r- ITEiValid, f^{RegFile_SpecQ, DestReg, Result i), RegFile_Spec^ 

PC_Speci < — \-4(^PC^ 

The behavior of the implementation processor is captured by one cycle of regular sym- 
bolic simulation (Flush is set to false), followed by flushing: 

Stalljbar < \{IF _EX_Valid a Valid a {IF_EX_DestReg = SrcReg)) 

IF_Valid <r- Valid a Stalljbar 

Datai J(RegFileQ, SrcReg) 

ALU_Datai ^ ITE{EX_D_Valid a {SrcReg = EX_D_DestReg), EX_D_Result, Datai) 
Result 2 ^ ALU{Op, ALU_Data^ 

RegFile_Impl <r- ITE{IFJValid, fJRegFile_SpecQ, DestReg, ResultJ, RegFile_SpecJ 
PCJmpl ^ ITE{Stall_bar, +4{PC\ PC) 

The correctness formula is defined according to (1), given that the processor can fetch 
up to 1 new instruction and has 2 user- visible state elements — PC and RegFile: 

^PC 0 ^ {PC_SpecQ = PC_Impl) 

^RegFile, 0 ^ {Reg File _SpecQ = RegFile _Impl) 

mp(j I ^ {PC_Speci = PC_Impl) 

PiRegFile,! ^ {Reg File _Speci = RegFile _Impl) 

correctness ^ mpc^ Q v rupc^i Auip^gpu^ i 

All of the above expressions are generated by symbolically simulating the implemen- 
tation and specification processors with a term-level symbolic simulator [21]. 

Applying rule (4) to expression ALU_Datai, we get: 

ALU_Datai <^J{RegFilei, SrcReg) 

This is achieved by using a unique-expression hash table [16], so that 

ITE{EX_D_Valid, fJRegFileQ, EX_D_DestReg, EX_D_Result), RegFile J 

is identified as the existing identical expression RegFUei. 

UF is next applied in order to abstract the one level of forwarding that affects 
expression ALU_DataQ (Step 1.2 of the algorithm for the hybrid memory model): 

tempQ f^JD_WB_DestReg, D_WB_Result, IF_EX_SrcReg, IF_EX_Data) 

ALU_DataQ ^ ITE{D_WB_Valid, tempQ, IF_EX_Data) 

As a result, the only general equation left in the correctness formula is 
{lF_EX_DestReg = SrcReg) in expression Stalljbar, where both lF_EX_DestReg and 
SrcReg are addresses for the abstracted memory RegFile, i.e., they are used as address 
arguments in applications of/^ and/^^ where the initial memory state is domain variable 
RegEile. Hence, that equation is a control equation and domain variables 
IE_EX_DestReg and SrcReg are control addresses. 

Recursively applying Step 3 of the algorithm for the hybrid memory model to 
expressions Data^ and ALU_Datai, where the address term SrcReg of/^ is a control 
address, we get: 
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temp I 
temp2 
temp 2 
temp^ 
temp^ 
temp^ 
DatUQ 

ALU_Datai 



fjiRegFile, SrcReg) 

fu(fP_WB_DestReg, D_WB_Result, SrcReg, tempi) 
ITE{D_WB_Valid, temp2, tempi) 
f^^{EX_D_DestReg, EX_D_Result, SrcReg, temp 2) 
^ ITE{EX_D_Valid, temp/^, ternp^) 

^ IF_EX_Valid a {IE_EX_DestReg = SrcReg) 

<r- ITEitemp^, Result^, temp^) 

^ temp^ 



Now the only general equation left in the correctness formula is (IE_EX_DestReg = 
SrcReg). Hence, only these two address terms will be g-terms, while the rest will be 
p-terms, and just one Boolean variable will be introduced. In contrast, there will be 
8 eij variables if the hybrid memory model is not applied: 1 for encoding the above 
equation; 3 for the equations between SrcReg and each of the destination registers used 
for writes within RegFile_SpecQ in order to account for the forwarding property when 
read(RegFile_SpecQ, SrcReg) is eliminated (see Sect. 2); and 4 for equations between a 
new domain variable and each of the 4 destination registers used for writes to the 
RegFile when its final states are compared for equality (see Sect. 3). The reduction in 
the number of variables and the speedup become dramatic for complex micropro- 
cessors, as shown in Sect. 6. 



5 Rewriting Rules for Processors with Multicycle ALUs 

Processors with multicycle ALUs require applying an additional transformation on the 
EUFM correctness formula before the rules from Sect. 3. The problem stems from the 
uncertain advance, as controlled by signal Done, of the instruction in the Execute stage 
during the single cycle of regular symbolic simulation — see Eig. 2. Multicycle ALUs 
are abstracted with a place holder [17] [18], where an UE is used to abstract the func- 
tionality, and a new Boolean variable is introduced to express the non-deterministic 
completion of the computation during each cycle. The place holder is forced to com- 
plete a new computation on every cycle during flushing. As noted in Sect. 3, modifying 
the processor during flushing can only result in false negatives. 

ID_EX EX_MEM 




Fig. 2. The Execute stage of a pipelined processor with a multicycle ALU. 

Let c be the value of signal Done during the single cycle of regular symbolic sim- 
ulation. Then, on the next cycle, EX_Data will have the expression ITE(c, read(m, ra), 
Jq), where read(m, ra) is data that has been read from the register file by the next 
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instruction, and Jq is a domain variable for the initial value of EX_Data. Similarly, 
EX_SrcReg will have the expression ITE(c, ra, Gq), where ra is the source register for 
the next instruction and Gq is a domain variable for the initial value of that signal. 
Assuming one level of forwarding, ALU_Data will have an expression of the kind: 

ITE(e A (wG = ITE(c, ra, Gq)), ITE(c, reGd(m, ra), Jq)) (10) 

where wg, and e are, respectively, the destination register, data, and enabling condi- 
tion for a write in flight to the register file. Since the above expression does not exactly 
match either of rules (4) and (5), it is rewritten by pulling c to the top of the expression 
and simplifying the read address along each branch of the new top-level ITE\ 

ITE(c, ITE(e a (wg = ra), reGd(m, ra)), ITE(e a (wg = Gq), d^, Jq)) (11) 

Now the then-expression (selected when c = true) of the top-level ITE can be rewritten 
using rule (4), while the else-expression can be rewritten using rule (5). 

6 Experimental Results 

The benchmarks used for the experiments are the same as in our previous work [17]: 

IxDLX-C: A single-issue, 5-stage, pipelined DLX processor [10], capable of execut- 
ing the 6 instruction types of register-register, register-immediate, load, store, branch, 
and jump. The 5 stages are Eetch, Decode, Execute, Memory, and Write-Back. Eor- 
warding is used to bypass the data results from the Memory and Write-Back stages to 
the functional units in the Execute stage. However, forwarding is impossible when a 
load provides data for the immediately following instruction. In such cases, the data 
hazard is prevented by a load interlock that stalls the dependent instruction in Decode 
until the load completes and its result can be forwarded from Write-Back when the 
dependent instruction is in Execute. There are 2 stalling conditions that can trigger a 
load interlock — one for each of the two source registers of the instruction in Decode. 
Stalling due to the second source register is done only when its data value is actually 
used, e.g., the instruction is not of type register-immediate, so that its immediate data 
value will be used instead. 

2xDLX-CA: A dual-issue, superscalar DLX consisting of two 5-stage pipelines. The 
first pipeline is complete, i.e., capable of executing the 6 instruction types, while the 
second pipeline can execute only arithmetic (register-register and register-immediate) 
instructions. Since load instructions can be executed by the complete pipeline only, 
there is at most 1 load destination register in the Execute stage, but 4 source registers in 
the Decode stage for a total of 4 possible load interlock stalling conditions. If a load 
interlock is triggered due to the first instruction in Decode, both instructions in that 
stage get stalled, so that the processor fetches 0 new instructions. Else, under a load 
interlock due to the second instruction in Decode, only the first instruction in that stage 
is allowed to proceed to Execute, while the second moves to the first slot in Decode 
and the processor fetches only 1 new instruction to fill the second slot in Decode. 
Additionally, only the first instruction in Decode is allowed to proceed when its result 
is used by the second instruction in that stage, or when the second instruction is not 
arithmetic (i.e., there is a structural hazard), in which case that instruction has to be 
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executed by the first pipeline. Hence, 0, 1, or 2 new instructions can be fetched each 
cycle. This design is equivalent to Burch’s processor [7]. 

2xDLX-CC: A dual-issue, superscalar DLX with two complete 5-stage pipelines. 
Now 2 load destination registers in Execute could provide data for each of the 4 source 
registers in Decode, for a total of 8 load interlock stalling conditions. When a load 
interlock is triggered for one of the two instructions in Decode, or when the second 
instruction in that stage depends on the result of the first, the instructions in Decode 
proceed as in 2xDLX-CA. However, there is no structural hazard, as both pipelines are 
complete. Again, 0, 1, or 2 new instructions can be fetched each cycle. 

Each of the above processors also has an extension with: branch prediction, 
marked multicycle functional units, “-MC,” where the Instruction Memory, the 

ALUs in the Execute stage, and the Data Memory can each take multiple cycles to pro- 
duce a result; exceptions, “-EX,” where the Instruction Memory, the ALUs, and the 
Data Memory can raise an exception; as well as combinations of these features. (Eor 
detailed descriptions of these processors and their verification see [17] [18].) 

The results are presented in Tables 1 and 2. The experiments were performed on a 
336 MHz Sun4 with 1.2 GB of memory. The Colorado University BDD package [8], 
and the sifting BDD variable reordering heuristic [14] were used to evaluate the final 
propositional formulas. The CPU times are reported for the sequence of symbolic sim- 
ulation, translation of the EUEM correctness formula to a propositional one, and eval- 
uation of the latter with BDDs. The ratios in the last columns of the tables are of the 
CPU times before and after applying the automatic abstraction of the register file. 
Burch’s controlled flushing [7] was employed for all of the designs. 

The experiments with automatically abstracted register files were run with a 
breadth-first elimination of the UEs and UPs with the nested /TEs scheme (see Sect. 2) 
when translating the EUEM correctness formula to propositional logic. A variant of 
the fanin heuristic for BDD variable ordering [12] was used: all nodes in the proposi- 
tional logic DAG are sorted in descending order of their fanout counts; unless already 
created, the BDD for each node in that order is built in a depth-first manner, such that 
the inputs to each AND and OR gate are sorted in descending order of their topological 
levels and their BDDs are built in that order. The experiments without abstracting the 
register file were run with various heuristics and optimizations — no single strategy 
performed uniformly as the best across all benchmarks — and the statistics from the 
experiment with the minimum CPU time for each benchmark are reported. 

As Tables 1 and 2 show, the number of register variables is reduced to at most 
10 when automatically abstracting the register file from up to 152 for the most com- 
plex benchmark, 2xDLX-CC-MC-EX-BP, before the abstraction. The e^j register vari- 
ables left after the abstraction are those that encode register id equality comparisons 
made by the stalling logic only in the single cycle of regular symbolic simulation of 
the implementation processor. In the case of the single-issue processors, the 2 source 
registers in Decode are compared with the 1 possible load destination register in Exe- 
cute, for a total of 2 e^j register variables. 2xDLX-CA and its variants have 4 source 
registers in Decode and still 1 possible load destination register in Execute (the second 
pipeline cannot execute load instructions). Eurthermore, the 2 source registers of the 
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Processor 


Auto. 

Abs. 

Reg. 

File 


BDD Variables 


Max. 

BDD 

Nodes 


Memory 

[MB] 


CPU 

Time 

[si 


CPU 

Time 

Ratio 




Other 


Total 


Reg. 


Br. 


Total 


IxDLX-C 




27 


0 


27 


36 


63 


2,155 


5.6 


0.26 


1.24 


✓ 


2 


0 


2 


34 


36 


714 


5.4 


0.21 


IxDLX-C-BP 




27 


8 


35 


41 


76 


3,408 


5.7 


0.36 


1.24 


✓ 


2 


8 


10 


39 


49 


2,224 


5.5 


0.29 


IxDLX-C-MC 




45 


0 


45 


54 


99 


4,520 


5.9 


0.75 


1.74 


✓ 


2 


0 


2 


46 


48 


3,095 


5.6 


0.43 


IxDLX-C-EX 




27 


0 


27 


64 


91 


7,122 


6.4 


1.16 


1.21 


✓ 


2 


0 


2 


64 


66 


5,364 


6.0 


0.96 


IxDLX-C-MC-EX 




36 


0 


36 


77 


113 


18,108 


6.5 


4.53 


2.35 


✓ 


2 


0 


2 


76 


78 


10,313 


6.4 


1.93 


IxDLX-C-MC-EX-BP 




36 


10 


46 


81 


127 


17,236 


6.5 


4.06 


2.0 


✓ 


2 


10 


12 


80 


92 


10,839 


6.3 


2.03 



Table 1. Statistics for the formal verification of the single-issue pipelined processors. 
“Auto. Abs. Reg. File” stands for “automatically abstracted register file.” The e/y “Reg.” 
variables are the ones that encode equality comparisons between register identifiers. The 
Ojj “Br.” variables are the ones that encode equality comparisons between predicted and 
actual branch address targets. 

second instruction in Decode are compared for equality with the destination register of 
the first instruction in that stage, in order to avoid Read- After- Write hazards [10]. 
Hence, there are 6 e^j register variables for these benchmarks after abstracting the reg- 
ister file. Processor 2xDLX-CC and its extensions can additionally have a load in the 
Execute stage of the second pipeline, so that the load interlock logic also compares the 
destination register of that instruction against the 4 source registers in Decode, for a 
total of 10 efj register variables. As expected, most of the register ids have become 
p-terms after the automatic abstraction of the register file and no longer require Bool- 
ean variables for encoding their equality comparisons with other register ids. 

The speedup for the single-issue pipelined processors is at most 2.0 times after 
applying the automatic abstraction of the register file, as these designs are relatively 
simple and could be verified very efficiently before the abstraction. Indeed, the extra 
time spent applying the transformation rules on the least complex benchmark, 
IxDLX-C, is approximately equal to the time saved in the BDD-based evaluation of 
the resulting Boolean correctness formula, so that the CPU time was reduced with 
only 0.05 seconds. However, the speedup becomes dramatic for the dual-issue super- 
scalar benchmarks and ranges from 5.7 to 43.5 times. The maximum number of 
BDD nodes is also reduced, ranging from 23% for 2xDLX-CA to less than 7% for 
2xDLX-CC-MC, relative to the BDD node count without the abstraction. 

Benchmark IxDLX-C was first formally verified by Burch and Dill [6], who 
required the user to manually provide a case- splitting formula, indicating the condi- 
tions under which the processor will fetch and complete 1 new instruction. In order for 
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Processor 


Auto. 

Abs. 

Reg. 

File 


BDD Variables 


Max. 

BDD 

Nodes 


Memory 

[MB] 


CPU 

Time 

[si 


CPU 

Time 

Ratio 




Other 


Total 


Reg. 


Br. 


Total 


2XDLX-CA 




112 


0 


112 


58 


170 


17,492 


6.8 


4.7 


5.73 


✓ 


6 


0 


6 


56 


62 


4,004 


6.2 


0.82 


2XDLX-CA-BP 




112 


IS 


130 


68 


198 


29,397 


7.2 


10.5 


8.14 


✓ 


6 


22 


28 


66 


94 


6,256 


Kl 


1.29 


2XDLX-CA-MC 




142 


0 


142 


76 


218 


68,895 


8.2 


21 


7.19 


✓ 


6 




6 


75 


81 


9,470 


6.9 


2.92 


2xDLX-CA-EX 




122 


0 


122 


92 


214 


143,330 


12 


142 


21.85 


✓ 


6 


0 


6 


92 


98 


21,496 


8.6 


6.5 


2xDLX-CA-MC-EX 




146 


0 


146 


125 


271 


281,966 


14 


350 


14 


✓ 


6 


0 


6 


124 


130 


58,847 


9.3 


25 


2xDLX-CA-MC-EX-BP 




150 


m 


211 


131 


342 


735,380 


22 


1,137 


18.34 




6 


KB 


2 ^ 


130 


159 


96,346 


9.3 


62 


2XDLX-CC 




116 


m 


116 


69 


185 


40,586 


7.7 


20 


10.2 


✓ 


10 


0 


10 


65 


75 


7,823 


6.3 


1.96 


2XDLX-CC-BP 




122 


25 


147 


82 


229 


74,555 


8.8 


43 


8.21 


✓ 


10 


28 


38 


78 


116 


19,988 


6.6 


5.24 


2XDLX-CC-MC 




142 


0 


142 


94 


236 


315,732 


14 


226 


37.67 


✓ 


10 


0 


10 


92 


102 


21,322 


7.4 


6 


2xDLX-CC-EX 




122 


0 


122 


103 


225 


413,732 


18 


486 


17.36 


✓ 


10 


0 


10 


102 


112 


68,159 


9.6 


28 


2xDLX-CC-MC-EX 




146 


0 


146 


150 


296 


1,229,056 


34 


3,571 


43.55 


✓ 


10 


0 


10 


148 


158 


118,216 


11 


82 


2xDLX-CC-MC-EX-BP 




152 


35 


187 


158 


345 


1,009,206 


29 


2,593 


21.61 


✓ 


10 


32 


42 


156 


198 


185,249 


12 


120 



Table 2. Statistics for the formal verification of the dual-issue superscalar processors. 
“Auto. Abs. Reg. File” stands for “automatically abstracted register file.” The e/y “Reg.” 
variables are the ones that encode equality comparisons between register identifiers. The 
Qjj “Br.” variables are the ones that encode equality comparisons between predicted and 
actual branch address targets. 

Hosabettu [11] to formally verify the same benchmark, he needed a month of manual 
work for the definition of completion functions — one per unfinished instruction, 
describing how that instruction would be completed, assuming all instructions ahead 
of it in program order have been completed. The completion functions for all instruc- 
tions in flight are composed manually in order to compute the abstraction function — 
mapping an implementation state to a specification state — necessary for the commuta- 
tive diagram. Ritter, et al [13] could verify the same benchmark after running their 
symbolic simulator for 65 minutes of CPU time. 
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Benchmark 2xDLX-CA was first verified by Burch [7] who needed around 30 
minutes of CPU time (on a slower Sun4 than the one used for the experiments in this 
paper) only after manually defining 28 case- splitting formulas and decomposing the 
commutative correctness diagram into 3 diagrams that are easier to verify. However, 
that decomposition was subtle enough to warrant the publication of its correctness 
proof as a separate paper [22]. Hosabettu [11] needed again a month of manual work 
for the definition of the completion functions for this design. Note that the tool [21] 
used for the experiments in this paper is completely automatic. It does not require 
manual intervention except for defining the controlled flushing of the implementation 
processor — something that takes a couple of minutes and has to be done just once for 
each design. 

7 Conclusions and Future Work 

An order of magnitude reduction was achieved in the CPU time for the formal verifica- 
tion of dual-issue superscalar microprocessors with multicycle functional units, excep- 
tions, and branch prediction. That was possible by automatically applying a system of 
conservative transformation rules for abstracting the register file in a way that sepa- 
rates the forwarding and stalling logic, modeling completely only those levels of for- 
warding that directly interact with stalling conditions, but abstracting conservatively 
the rest. The transformation rules are based on the assumption that reads and writes of 
user- visible state are not reordered and occur in their program sequence. 

The effectiveness of a set of conservative rewriting rules depends on accounting 
for variations in the description style used for the implementation and specification 
processors, so that false negatives are possible when certain cases are not considered. 
However, the potential gain from such rewriting rules is a dramatic speedup of up to 
orders of magnitude, as demonstrated in this paper. 

The same transformation rules can be expected to speed up the checking of live- 
ness properties for in-order microprocessors, where a design will be simulated for a 
fixed number of more than one clock cycles in order to prove that it will eventually 
complete a new instruction. Future work will extend the transformation rules for appli- 
cation to out-of-order microprocessors. 
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Abstract. We show how to attack the problem of model checking a 
C program with recursive procedures using an abstraction that we for- 
mally define as the composition of the Boolean and the Cartesian ab- 
stractions. It is implemented through a source-to-source transformation 
into a ‘Boolean’ C program; we give an algorithm to compute the trans- 
formation with a cost that is exponential in its theoretical worst-case 
complexity but feasible in practice. 



1 Introduction 

Abstraction is a key issue in model checking. Much attention has been given to 
Boolean abstraction (a.k.a. existential abstraction or predicate abstraction); see 
e.g. [10,15,6,16,13,11]. The idea of Boolean abstraction is to map states to ‘ab- 
stract’ states according their evaluation under a finite set of predicates (boolean 
expression over program variables) on states. The predicates induce an ‘abstract’ 
system with a transition relation over the abstract states. An approximation of 
the set of reachable concrete states (in fact, an inductive invariant) is obtained 
through a fixpoint of the ‘abstract’ post operator. 

Motivated by the fact that computing the Boolean abstraction (i.e. comput- 
ing the transition relation between ‘abstract’ states) is prohibitively costly, we 
propose a new abstraction, obtained by adding the Cartesian abstraction on top 
of the Boolean abstraction. The Cartesian abstraction underlies the attribute 
independence in certain kinds of program analysis (see [9]). It is used to approx- 
imate a set of tuples by the smallest Cartesian product containing this set. The 
new abstraction is induced by predicates over states, but it cannot be defined by 
a mapping over states (i.e., a state cannot be assigned a unique abstract value). 
We use the framework of abstract interpretation [8] and Galois connections to 
specify our abstraction as the formal composition of two abstractions. 

We present an algorithm for computing the ‘ideal’ abstract post operator 
(“post^^”) wrt. the new abstraction (defined through a Galois connection). The 
algorithm is exponential in its worst-case complexity, but it is feasible in practice; 
it is the first algorithm in this context of abstract model checking that does not 
compute the value explicitly for each ‘abstract’ state. This gain in efficiency 
must, in theory, be traded with a loss of precision. We identify the single causes 
of loss of precision under the Gartesian abstraction. To eliminate most of these, 

* On leave from Max Planck Institute, Saarbriicken, Germany. 
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we introduce three refinements of post^^ that are based on standard concepts 
from program analysis. We have an implementation that combines all three 
refinements and that makes the loss of precision practically negligible. 

The use of Cartesian abstraction allows us to represent the abstract post 
operator of a C program in form of a Boolean program [2]. Boolean program 
are C programs where all expressions and all variables range over the three 
truth values 1, 0 and * (for ‘unknown’). As C programs, Boolean programs have 
the usual update statements, and they may have procedures with call-by-value 
parameter passing, local variables, and recursion. 

We next explain the specific context of our work. The SLAM project^ at Mi- 
crosoft Research is building processes and tools for checking temporal properties 
of system software written in common programming languages, such as C. 

The existence of both infinite eontrol and infinite data in (even sequential) 
software makes model checking of software difficult. Infinite control comes from 
procedural abstraction and recursion. Infinite data comes from the existence of 
unbounded data types such as integers and pointer-based data structures. Infi- 
nite control and unbounded arithmetic data has been studied in model checking 
in isolation, namely for pushdown systems resp. protocols, parameterized sys- 
tems or timed and hybrid systems (see e.g. [17]). However, the combination 
of unbounded stack-based control and unbounded data has not been handled 
before.^ 

The SLAM project addresses this fundamental problem through a separation 
of concerns that abstracts infinite data domains through Cartesian and Boolean 
abstractions, and then uses well-known techniques [22,19] to analyze the resul- 
tant Boolean program, which has infinite control (but ‘finite data’). The data 
are abstracted according to their evaluation under a given a set V of predicates 
on states of the C program. 

Our working hypothesis is that for many interesting temporal properties of 
real-life system software, we can find suitable predicates such that the abstraction 
is precise enough to prove the desired invariant. Refinement can be accomplished 
by the addition of new predicates. 

Given an invariant Inv to check on a C program, the SLAM process has three 
phases, starting with an initial set of predicates V and repeating the phases 
iteratively, halting if the invariant Inv is either proved or disproved (but possibly 
non-terminating) : 

1. construct an abstract post operator under the abstraction induced by V; 

2. model check the Boolean program that represents the abstract post operator; 

3. discover new predicates and add them to the set V in order to refine the 
abstraction. 

In this paper, we address the issue of abstraction in Phase (1). In principle. 
Phases (2) and (3) will follow the lines of other work on interprocedural program 

^ http : / / research . microsof t . com/ slam/ 

^ There are other promising attempts at model checking for software, of course, such 
as the Bandera project, for example, where non-recursive procedures are handled 
through inlining [7]. 
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analysis [22,19], and abstraction refinement [4,15]). For more detail on Phase (2), 
see [2]. 

We note that the specific context of the SLAM project has the following 
consequences for the abstraction of the post operator and its computation in 
Phase (1): 

— It is important to give a concise definition of the abstract post operator, not 
only to guide its implementation but also to guide the refinement process 
(i.e. to help identify the cause of imprecision in a given abstraction). 

— The abstract post operator must be computed for its entire domain. That is, 
it cannot be restricted a priori to a subset of its domain. At the moment when 
the abstract post operator for a statement within a procedure is computed, it 
is generally impossible to foresee which values the statement will be applied 
to. 

In the work on Boolean abstraction that is most closely related to ours, Graf 
and Sa’idi [13] define an approximation of the Boolean abstraction of the post 
operator; our abstraction can be used to formalize that approximation in terms 
of a Galois connection, using a new abstract domain. 

The procedure of [13] computes the image of their abstract post operator for 
each ‘abstract state’ with a linear number of calls to a theorem prover (in the 
number n of predicates inducing the Boolean abstraction). This is better than 
computing the image of the standard Boolean abstraction of the post operator, 
which requires exponentially many calls to a theorem prover; but still, it is only 
feasible if done ‘on demand’, i.e. for each reachable ‘abstract’ state (and if the 
number of those remains small). 

In our setting, the procedure of [13] would require a fixed number 2^^ • 2 • n of 
calls to a theorem prover. In this paper, we give a procedure with 0{2'^) • 2 • n 
calls; i.e., in comparison with [13], we replace the fixed (or best-case) factor 2'^ by 
a worst-case factor 0(2^^), which makes all the difference for practical concerns. 

2 Example C Program 

In this paper, we are concerned with two SLAM tools: (1) c2bp, which takes a 
G program and a set of predicates, and produces an abstract post operator rep- 
resented by a Boolean program [1], and (2) bebop, a model checker for Boolean 
programs. [2] We illustrate c2bp and bebop using a simple G program P shown 
in the left-hand-side of Figure 1. The property we want to check is that the 
assertion in line 9 is never reached, regardless of the context in which foo is 
called. The right-hand-side of Figure 1 shows the Boolean program B that c2bp 
produces from P, given the set of predicates { (z==0) , (x==y) }. The Boolean 
variables bl and b2 represent the predicates (z==0) and (x==y) , respectively. 
Each statement of the C program is translated into a corresponding statement 
of the Boolean program. For example, the statement, z = 0; in line 2 is trans- 
lated to bl := 1;. The translation of the statement x++; in line 5 states that 
if b2 is 1 before the statement, then it guaranteed to be 0 after the statement. 
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decl bl, b2; 





int X, y, z, w; 


/* bl stands for predicate 










b2 stands for predicate 




void fooO 




void fooO 










begin 


[1] 




do { 


[1] 


do 


[2] 




z = 0; 


[2] 


bl := 1; 


[3] 




X = y; 


[3] 


b2 := 1; 


[4] 




if (w){ 


[4] 


if (*) then 


[5] 




X++; 


[5] 


b2 := choose(0,b2) : 


[6] 




z = 1; 


[6] 


bl := 0; 






} 




fi 


[7] 




} while (x!=y) 


[7] 


while ( b2 ) 


[8] 




if (z){ 


[8] 


if (!bl) then 


[9] 




assert (0) ; 


[9] 


assert (0) ; 






} 




fi 




> 






end 










bool choose (el ,e2) 










begin 








[10] 


if (el) then 








[11] 


return (1) ; 








[12] 


elsif (e2) then 








[13] 


return (0) ; 








[14] 


else 








[15] 


return (*) ; 



fi 



end 



Fig. 1. An example C program, and the Boolean program produced by c2bp 
using predicates (z==0) and (x==y) 



otherwise the value of b2 after the statement is unknown, represented by * in 
line 15. The Boolean program B can be now fed to bebop, with the question: “is 
line 9 reachable in BT\ and bebop answers “no”. We thus conclude that line 9 
is not reachable in the C program P as well. 



3 Correctness 

We fix a program (e.g. a C program) generating a transition system with a 
set States of states 5i, 52, . . . and a transition relation s — > s' . The operator post 
on sets of states is defined as usual: post (S') = {s' | exists s e S : s — ^ s'}. 

In Section 7 we will use the ‘weakest precondition’ operator ^ on sets of 
states: ^{S') = {s \ for all s' such that s — ^ s' : s' C S'}. 

In order to define correctness, we fix a subset in it of initial states and a 
subset unsafe of unsafe states (its complement safe = States — unsafe is the set of 
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safe states). The set of reachable states (reachable from an initial state) is the 
least fixpoint of post that contains in it, also called the closure of in it under post, 
posb^(init) = init U post(init) U 

The given program is correct if no unsafe state is reachable; i.e., if 
post* (init) C safe. A safe (inductive) invariant is a set of states S that contains 
the set of initial states, is a closure under the operator post and is contained in 
the set of all safe states, formally: 5 C safe, 5 ^ post (5), and S ^ init. 

Correctness is established by computing a safe invariant. One way to do so 
is to find an ’abstraction’ post^ of the operator post and compute the closure 
of post^ on init (and check that it is a subset of safe). In the next section, we 
will make the idea of abstraction formally precise. 



4 Boolean Abstraction 

For the purpose of this paper, we fix a finite set V of state predicates V = 
{pi, . . . ,p^}. A predicate pi denotes the subset of states that satisfy the pred- 
icate, {s G States I s |= Pi}. The predicate is usually defined by a Boolean 
expression over program variables. 

We distinguish the terms approximation and abstraction. The set V of state 
predicates defines the Boolean approximation of a set of states S as Boolean (5), 
the smallest set containing S that can be denoted by a Boolean expression over 
predicates in V (formed as usual with the Boolean operators A, V, ^); this set is 
sometimes referred as the Boolean covering of the set. This approximation can 
be defined through an abstract domain and two functions abooi and 7booi that 
we define below (following the abstract interpretation framework [8]); namely, 
the Boolean approximation of a set of states S is the set of states Boolean (5) = 
7booi(<abooi(*^))- The two functions are used to directly define the operator post^^i 
on the abstract domain as an abstraction of the fixpoint operator post over sets 
of states. 

Having fixed P, we define the abstract domain AbsDombooi as the set of all 
sets V of bitvectors v of length n (one bit per predicate p^ G P, for z = 1, . . . , n), 
AbsDombooi = together with subset inclusion as the partial ordering. 

The abstraction function is the mapping from the concrete domain the 

set of sets of states (again with subset inclusion as the partial ordering), to the 
abstract domain, assigning a set of states S the set of bitvectors representing 
the Boolean covering of 5, 

cibooi : ^ AbsDombooi 

S ^ {(vi, ...,Vn) I STl {s I s ^ -Pi A . . . A ^ 0} 

where 0 • pi = ^pi and 1 • pi = Pi. The meaning function is the mapping 

7booi : AbsDom ^ 

y {s I exists {vi, . . . ,Vn) eV : S \= VI ■ Pi A . . . AVn ■ Pn}- 
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Given AbsDorribooi and the function abooi (which forms a Galois connection to- 
gether with the function 7 booi )5 the ‘best’ abstraction of the operator post is the 
operator post^^i on sets of bitvectors defined by 

POStfool = “bool O post O 7boo| 

where the functional composition / o ^ of two functions / and g is defined from 
right to left; i.e., f o g{x) = f{g{x)). 

5 Cartesian Abstraction 

Given the vector domain D\X. . .xDn, the Cartesian approximation Cartesian(r) 
of a set of vectors V is the smallest Gartesian product of subsets of Di, . . . , 
Dn that contains the set. It can be defined by the Gartesian product of the 
projections TIi{V), Cartesian(V') = TIi{V) x ... x Un(V), where ni{V) = 
{vi I {vi,...,Vn) C V} etc.. In order formalize the Gartesian approximation 
of a fixpoint operator, one uses the abstraction function from the concrete do- 
main of sets of tuples to the abstract domain of tuples of sets (with pointwise 
subset inclusion as the partial ordering), 

^ . • O-Di X . . . X Z^TT, , O-Di (yDfi 

Ctcartesian • ^ ^ ^ a ... a z, 

v^{n^{v),...,nn{v)) 

and the meaning function 7cartesian mapping a tuple of sets (Mi,...,Mn) to 
their Gartesian product Mi x ... x Mn- I.e., we have Cartesian (i^) = 7cartesian o 

Z^cartesian (^) • 

In general, one has to account formally for the empty set (i.e., introduce a 
special bottom element A and identify each tuple of sets that has at least one 
empty component); in the context of the fixpoints considered here (we look at 
the smallest fixpoint that is greater than a given element, e.g. (abooi(init)), we 
can gloss over this issue. 

We next formalize the Gartesian approximation for sets of bitvectors. The 
nonempty sets of Boolean values are of one of three forms: {0}, {1} or {0, 1}. 
It is convenient to write 0 for {0}, 1 for {1} and * for {0, 1}, and thus repre- 
sent a tuple of sets of Boolean values by what we call a trivector, which is an 
element of {1,0,*}’^. We therefore introduce the abstract domain of trivectors, 
AbsDoiricartesian = {0,1,*}’^ (again, we gloss over the issue of a special trivec- 
tor A). The partial ordering < is the pointwise extension of the partial order 
given by 0 < * and 1 < *; i.e., for two trivectors {vi, . . . , Vn) and {v'l, . . . , v'^), 
{vi, . . . , Vn) < {v'l, . . . , v'^) if vi < v[, . . . , Vn < v'^. The Gartesian abstrac- 
tion (^cartesian uiaps a sct of bitvcctors ^ to a trivector, 

Z^^cartesian • AbsOoiTlbool ^ AbsDomc3i'tesian 5 ^ ^ • • • 5 ^n) 

where, for i = 1, . . . , n, (a) Vi = 0 if IIi{V) = {0}; (b) Vi = 1 if IIi{V) = {!}; (c) 
Vi = ^ifn,{V) = {0,l}. 
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The meaning 7 cartesian(^) of a trivector v is the set of bitvectors that are 
smaller than v (wrt. the partial ordering giving on trivectors given above); i.e., 
it is the Cartesian product of the n sets of bit values denoted by the components 
of V. The meaning function Tcartesian • Abs Do m cartesian ^ AbsDoiribooi forms a 
Galois connection with Ctcartesian- 



6 The Abstract Post Operator post^^, over Trivectors 

We define a new Galois connection by composing the ones considered in the 
previous two sections, 

O^b-c • 2 ^ AbsDoiTlcartesian 5 ^b-c ~ O^cartesian O Ctbool 

7 b-c • AbsDoiTIcartesian ^ 2 , 7 b-c ~ Tbool Tcartesian 

_I± 

and the abstract post operator over trivectors, post^^ : AbsDomcartesian ^ 
AbsDomcartesian, defined by post^^ = ctb-c o post o yb.c. 

We have thus given a formalization of the fixpoint operator that implicitly 
defines the invariant Invi given by Xi in [13]; i.e., the invariant is the meaning 
(under 7 b. c) of the least fixpoint of post^^ that is not smaller than the abstraction 

of init (under ctb-c), or Invi = 7 b-c(post^^ (ctb.c(init))). The invariant Invi is 
represented abstractly by one trivector, i.e. it is the Cartesian product of sets 
each described by p, or p V ^p (i.e. true) where p is a predicate of the set V. 

7 The c2bp Algorithm to Compute post^^. 

The algorithm takes as input the transition system (defining the operators 
post and ^) and the set of n predicates V; as output it produces the repre- 
sentation of post^^ in the form of a Boolean program over n ‘Boolean’ vari- 
ables ui,...,Un (whose values range over the domain {0,1,*}). Each state- 
ment of the Boolean program is a multiple assignment statement of the form 
(ui, . . . , Vn) := (ei, . . . , 6 n), where ei, . . . , are expressions over vi, . . . ^Vn that 
are evaluated to a value in {0, 1, *}. We write e[ui, . . . , for e if we want to 
stress that e is an expression over ui, . . . ,Un. The Boolean program represents 
the operator post^bp trivectors by 

POSt^bp({^^l>- • ■ ,Vn)) = {v[, . . . ,v'„) if v[ = ei[vi, . . . ,v„], . . . ,v'n = en[vi, . . . ,v„]. 

We will now explain how the algorithm computes the expressions ei[ui, . . . , 
for each z = 1, . . . , n. We first define the Boolean under- approximation of a set S 
wrt. V as the greatest Boolean expression over predicates in V whose denotation 
is contained in 5; formally, F(5) = uE G BoolExpr(P). {5 | 5 |= C S. That 
is, the set of states denoted by F(5) is States — (ybooi ° o;booi) (States — S). For 
the purpose of defining the algorithm, the set BoolExpr(P) consists of Boolean 
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expressions in the form of disjunctions of conjunctions of possibly negated pred- 
icates from V. The ordering e < e' is such that each disjunct of e implies some 
disjunct of e! (e.g., is greater than pi Ap2 Vpi A ^P2)- 

By repeated calls to a theorem prover,^ the algorithm computes the two 
Boolean expressions Ei{0) and £^2(1) over the predicates pi, . . . ,Pn 

Ei{0) = F(^({s I s 1= ~^Pi})), 

Ei{l) = F(^({s I s |=pj)). 

We define the two Boolean expressions ei(l) and ei{0) over the variables 
vi, ... ,Vn by direct correspondence from the two Boolean expressions Ei (0) 
and Ei{l) over the predicates pi, . . . ,Pn- 

The expression over the variables ... that defines the Ath value 
of the successor trivector of the Boolean program is = choose(e2(l), 62(0)), 
where the symbol choose stands for an if-then-elseif-then-else combinator on 
two Boolean expressions; i.e., the expression choose(e, e') applied to two Boolean 
expressions e and e', each over the variables ... evaluates as follows: 

choose{e[vi, . . . ,Vn],e'[vi, . . . ,Vn]) = if {vi, . . . ,Vn) ^ e then 1 

elseif {vi, ... ,Vn) \= e'then 0 
else * 



The satisfaction of a Boolean expression e by a trivector {vi^. . . ,Vn) is defined 
as one expects, namely {vi^ ... ^Vn) ^ e if all bitvectors in 7booi(('^i5 • • • , '^n)) 
satisfy e. Thus, for example, (0, 1, ^ Av 2 but (0, 1, *) ^ t’s and (0, 1, ^ 

^vs. (The extension of the Boolean operators to the domain {0, 1, *} is defined 
accordingly.) 

Proposition 1 (Correctness). The result of the c2bp algorithm is a Boolean 
program representing the Boolean and Cartesian abstraetion of the operator post, 
i.e. postfjbp = Post^^. 



Proof. We define the n abstraction functions by 

r 1 if M C {s I S \=pi} 
AAm) = < 0 if M C {s I s ^ -,pi] 
I * if neither 



#(b 



,# _ Ai) 



and the Ath abstract post function post^^ by post^^ = o post o 

Since the value of any nonempty set of states S under the abstraction Ob-c 
the trivector 

ab.c(5) = (a7(5),...,a7(5)), 



^ We consider the theorem prover as an oracle, which does exist for most practical 
concerns. It is easy to see that theoretically such an oracle does not exist and that 
post^c cannot be computed; i.e., the problem of deciding whether an 

operator is equal to post^^ (or post^^i) is undecidable. 
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we can express the abstract post operator post^^ over trivectors as the tuple of 
the abstract post functions, each mapping trivectors to values in {0, 1, *}, 

pOSt^^((vi, . . -,Vn)) = . . .,Vn)), . . . , pOSt^^^" . . .,Vn))). 

Now, we can represent the abstract post operator post^^ in terms of the sets 
bi(0), Vi{l) and defined as the inverse images of the values 0, 1 or *, 

n (i) 

respectively, under the Tth abstract post functions post^^ . 



T^i(0) = {K,.. 


,,w„) 1 post7 * 


■,Vn)) = 0} 






.,Vn)) = 1} 




. ,i;„) 1 


■,Vn)) = *} 



= AbsDorricartesian ~ (Vj(0) U ^,(1)) 

The statement of the proposition can now be expressed by the fact that the sets 
bi(0), Vi{l) and Ri(*) are exactly the sets of trivectors that satisfy the Boolean 
expressions 6^(0), 6^(1) or neither. 

^i(O) = {('^ 1 . • • • , ^n) I (^ 1 , . . . , '^n) h ei( 0 )} 

Vi{l) = {('Cl, ...,Vn) I {Vi, ...,Vn) H (1) 

V^(*) = {(r^i, • • • , Vfi) I (t’l, • • • 5 Vfi) y=- 62(0), (r^i, • • • 5 ^ 

That is, in order to prove the proposition we need to prove (1). 

Since AbsDoiribooi is a complete distributive lattice, the membership 
of a trivector . . . , '?^n) in ^z(O) is equivalent to the condition that 
7cartesian(('^i, • • • 5 '^n)) is Contained in 5i(0), the largest set of bitvectors that 
is mapped to the value 0 by the function o postybooi- That is, if we define 

Bi{0) = uB e AbsDoiTibooh o^b!c ° ° 7booi(^) = 0 

then 



b^(0) — • • • 5 '^n) ^ AbsDorricartesian | 7bool(('^l5 • • • 5 "^n)) ^ (2) 

By definition of we can express the set of bitvectors Bi{0) as 

Bi{0) = uB e AbsDoiribooi- post o 7booi(^) ^ {<§ | 5 |= 

The operators post and ^ form a Galois connection, i.e. post(S') C S' if and 
only if 5 C ^e{S'). Therefore, we can write 5i(0) equivalently as 

^2(0) = uB e AbsDoiribooi- 7 booi(^) ^ ^({<^ I |= 

Thus, ^2(0) is exactly the set of all bitvectors that satisfy the Boolean expres- 
sion 62(0). 

Bi{0) = {{vi,...,Vn) e {0,1}^ I ivi,...,Vn) H ei(0)} 

This fact, together with (2), yields directly the characterization of i^(0) in (1). 
The other two statements in (1) follow in the similar way. □ 
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Complexity. We need to compute F( 5 ) for 2 n sets S that are either of the form 
S = ^({5 G States I 5 ^ Pi] or of the form S = ^({5 G States | 5 |= -^Pi\. 

In order to compute each F( 5 ), we need to find all minimal implicants of S in 
the form of a cuhe^ i.e. a conjunction C = Azg/ possibly negated predicates 
(i.e., ^i is Pi or ^pi) such that {s | s ^ C} C S'. We use some quick syntactic 
checks to find which of the predicates pi can possibly influence S (i.e. such pi 
or ^pi can appear in a minimal implicant); usually, there are only few of those. 
‘Minimal’ here means: if an implicant C is found, no larger conjunction C A pj 
needs to be considered. Also, if C is incompatible with S (i.e., {5 | 5 ^ C]r\S = 0 ), 
no larger conjunction needs to be considered (since no conjunction C Apj can be 
an implicant). 



8 Loss of Precision under Cartesian Abstraction 



We will next analyze in what way precision may get lost through the Cartesian 
abstraction. It is important to distinguish that loss from the one that incurs 
from the Boolean abstraction. The latter is addressed by adding new predicates 
in the refinement phase. 

‘Loss of precision’ is made formally precise in the following way (see [8,12]). 
Given a concrete and an abstract domain, an abstraction a and a meaning 7, 
we say that the operator F does not lose precision under the abstraction to 
F* if JO F* = F o 7 (i.e., does not lose precision on the abstract value a if 
7 o F^{a) = F o 7(a)). 

n 

In our setting, F will always be instantiated by post^^i. In this section, the 
phrase ‘the Cartesian abstraction does not lose precision’ is short for ‘post^^i 
does not lose precision under the abstraction to post^^’, i.e. 7cartesian ° post^^ = 
P^^^£oi°7cartesian- Wc define au operator F on sets to be deterministic if it maps a 
singleton set to the empty set or another singleton set. The following observation 
will be used in Section 9 . 3 : 



Proposition 2. If the operator post^^i is deterministic, then the Cartesian ab- 
straction does not lose precision on trivectors {vi, ... ,Vn) such that Vi ^ for 
1 < i < n. 



Example 1 . We take the (simple and somewhat contrived) example of the C 
program with one statement x = y updating x by y and the set of predicates 
'P = {Pi^P2^P2} where p\ expresses “x > 5 ”, p2 expresses “x < 5 ” and ps 
expresses = 5 ”. Note that the conjunction of ^pi and ^p2 expresses x = 5 . 
The image of the trivector (0,0,0) under the abstract post operator post^^ is 
the trivector (*,*,0). Therefore, post^^((0, 0, 0)) = (*,*,0) because post^^ = 
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acartesian ° ttbooi ° post o 7 boo| o 7cartesian and by the following equalities. 

7cartesian( (0,0,0)) = {(0,0,0)} G AbsDombool 

7booi({ (0,0,0)}) = {(x,y) I X = 5, y 7^ 5} G 

post({(x,y) I X = 5, y / 5}) = {(x,y) I X = y, y / 5} G 

aboo\{{{x,y) \ x = y, y 5}) = {(1,0,0), (0, 1,0)} G AbsDombool 

^^cartesian ({(1, 0, 0), (0, 1, 0)}) = (*,*,0) G AbsDomcartesian 

The meaning of the trivector (*, *, 0) is a set of four bitvectors that properly con- 
tains the image of the Boolean abstraction of the post operator post^^i applied 
to the meaning of the trivector (0, 0, 0). 

7cartesian(pOSt#,((0, 0, 0)))= {(0, 0, 0), (1, 0, 0), (0, 1, 0), (1, 1, 0)} 

D {(1,0,0), (0,1,0)} 

“ POSt^Ql (7cartesian ((0, 0, 0))) 

That is, the Cartesian abstraction loses precision by adding the bitvector (0, 0, 0) 
(expressing x = b through the negation of both, x < 5 and x > 5) to 
the two bitvectors (1,0,0) and (0,1,0) that form the image of the Boolean 
abstract post operator. (The added bitvector (1,1,0) is semantically incon- 
sistent and will be eliminated by standard methods in Boolean abstraction; 
see [13].) Note that the concrete operator post is deterministic] the loss of pre- 

_j! 

cision in the Cartesian abstraction occurs because post^^i is not deterministic 
(post^Qi((0, 0, 0)) = {(1,0,0), (0,1,0)}; as an aside, post does not lose precision 
under the Boolean abstraction). 

Example 2. The next example is simpler than the previous one but it is not 
relevant in the context of C programs where the transition relation is determin- 
istic. Nondeterminism arises in the interleaving semantics of concurrent systems. 
Take a program with Boolean variables x and y (standing e.g. for ‘critical’) and 
the transition relation specified by the assertion x' = ~^y' (as usual, a primed 
variable stands for the variable’s value after the transition). For simplicity of pre- 
sentation, we here identify states and bitvectors. The image of every nonempty 
set of bitvectors under post^^i is the set of bitvectors {(0, 1), (1, 0)}. The image 
of every trivector under post^^ is the trivector (*,*) whose meaning is the set 
of all bitvectors. Here again, post^^i is not deterministic. Unlike the previous 
example, the concrete operator post is not deterministic as well. 

Example 3. The next example shows, in the setting of a deterministic transi- 
tion relation, that precision can get lost if post^^ is applied to a trivector with 
components having value *. Take a program with 2 Boolean variables xi,X 2 
and the transition relation specified by the statement “assume(xi = X 2 )”; its 
post operator, defined by post(U) = {(xi,X 2 ) ^ V \ v\ = X 2 }, is equal to its 
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Boolean abstraction. The image of the trivector (*, *) under post^^ is the trivec- 
tor (*,*). The image of its meaning 7 cartesian ((*5 *)) under post^^i is the set of 
bitvectors {(0, 0), (1, 1)}. 

We will come back to this example in Section 9.3; there, we will also consider 
the general version of the same program with n >2 Boolean variables xi, . . . , 
and the transition relation specified by the assertion xi = X 2 /\x[ = xi A. . . Ax^ = 
Xn- The image of the trivector under post^^ is the trivector 

The image of its meaning under post^^i is the set of all bitvectors whose first 
two components are equal. 



9 Refinement for post^^. 

In this section, we apply standard methods from program analysis and propose 
refinements of the Cartesian abstraction; these are orthogonal to the refinement 
of the Boolean abstraction by iteratively adding new predicates. 



9.1 Control Points 

We now assume a preprocessing step on the program to be checked which in- 
troduces new control points (locations). Each conditional statement (with, say, 
condition 0) is replaced by a nondeterministic branching (each nondeterministic 
edge going to a different location), followed by a (deterministic) edge enforcing 
the condition 0 or its negation (“assume(0)” or “assume (^0)”) as a blocking in- 
variant, followed by a (deterministic) edge with the update statement, followed 
by “joining” edges to the location after the original conditional statement. 

Until now, we implicitly assumed predicates pi for every control point £ of the 
program (expressing that a state is at location £). This would lead to a great loss 
of precision under the abstraction considered above. Instead, one formalizes the 
concrete domain as the sequence (2^^^^^^)*-°^ of state spaces indexed by program 
locations £ G hoc. Its elements are vectors S = {S[£])i^ioc of sets of states, i.e. 
S[£] G From now on, a state s G States consists only of the environment 

of the data variables of the program. Accordingly, the abstract domain is the 
sequence (AbsDomcartesian)'“°^-^ 

The post operator is now a tuple of post operators posU, one for each loca- 
tion £ of the control flow graph, post = (post[^])^^Loc, where post[^] is defined in 
the standard way. We define the abstract post operator accordingly as the tuple 

post^ = (post7M)£eLoc- 

If £ is the “join” location after a conditional statement and its two predeces- 
sors are £i and £2, then post[^](S') = S[£i] U S[£2]. We define the £-th abstract 

^ Note that we don’t need to model the procedure stack associated with the state. This 
is because the stack is implicitly present in the semantics of the Boolean program, 
and hence does not need to be abstracted by c2bp. Procedure call and return are 
handled essentially in the same way as assignment statements. See [1] for details. 
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post operator, post^^[^]((. . . • • •)) = v[li] U 7 ; [^ 2 ] where r’ U is 

the least upper bound of the two trivectors v and v' in AbsDomcartesian- 

In all other cases, there is a unique predecessor location for and post[^] 
is defined by the transition relation for the unique edge leading into 1. The i- 
th abstract post operator is then defined (and computed) as described in the 
preceding sections, post^^[^] = ab c ° post[^] o yb c- 

Specializing the observations in Section 8, we now study the loss of precision 
of post^Qi [£] under the Cartesian abstraction specifically for each kind of loca- 
tion 1. There is no loss of precision if the edge leading into ^ is one of the two 
nondeterministic branches corresponding to a conditional since all data values 
are unchanged. 

If the edge corresponds to an “assume (0)” statement (its post operator is 
defined by post(S') = {5 G /S | s |= (/)} for S' C States), then there is a loss of pre- 
cision exactly if (j) expresses a dependence between variables (such as x = ^ as in 
Example 3); Proposition 2 applies, since the operator post^^J^] is deterministic; 
we have post^QQi(E) = E H o;booi({5 G States \ s ^ (j)}. 

If the edge corresponds to an update statement, then (and only then) the op- 
erator post^Qi[^] may not be deterministic (even if the concrete operator post[^] 
is deterministic; see Example 1). If i is a “join” location, then the loss of pre- 
cision is apparent: the union of two Cartesian products gets approximated by 
a Cartesian product. This loss of precision gets eliminated by the refinement of 
the next section. 

9.2 Disjunctive Completion 

Eollowing standard methods from program analysis [8] , we go from the abstract 
domain of trivectors AbsDomcartesian to its disjunctive completion^ which we may 
model as the abstract domain of sets of trivectors, AbsDomb.c v = 2^^’^’*^^^ with 
the partial ordering E obtained by extending the ordering < on trivectors, i.e., 
for two sets V and V' of trivectors, we have V □ E' if for all trivectors u G E 
there exists a trivector v' G V' such that v < v' . Eor our purposes, the least 
element of the abstract domain AbsDomb.c v is the set {(^cartesian o <^booi(init)}. 

Note that the two domains AbsDombooi and AbsDomb-c v are not isomorphic; 
we have that Ei = {(0, *), (1, *)} is strictly smaller than V 2 = {(*,*)}• The 
reduced quotient of AbsDomb-c v (obtained by identifying sets with the same 
meaning, such as Ei and E 2 ) is isomorphic to AbsDombooi; there, the fixpoint 
test is exponentially more expensive than in AbsDomb-c v (but may be practically 
feasible if symbolic representations are used). 

The abstract post operator post^^.^ over sets of trivectors E G AbsDomb.c v is 
the canonical extension of the abstract post operator over trivectors to a function 
over sets of trivectors, i.e., for E G post^^.^(E) = {post^^(u) | u G E}. 

9.3 The Focus Operation 

Assuming the refinement to the disjunctive completion, we now introduce the 
focus operation (the terminology stems from an — as it seems to us, related — 
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operation in shape analysis via 3- valued logic [20]). This operation can be used 
to eliminate all loss of precision under Cartesian abstraction unless the Boolean 
abstraction of the post operator post[-^] at location ^ is nondeterministic (as in 
Examples 1 and 2). 

The idea of the focus operator can be explained at hand of Example 3. Here, 
the assertion defining the operator post associated with the “assume(xi = X 2 )” 
statement (which corresponds to the assertion = X 2 A x\ — x\ t\ x '2 — ^ 2 ”) 
expresses a dependence between the variables x\ and X 2 . Therefore, one defines 
the focus operation focus[l,2] that, if applied to a trivector of length n > 2, 
replaces the value * in its first and second components; i.e., 

focus[l,2]((i;i,i;2,i^3,---,Wn)) = 

{{v'i,V 2,V3, . . . ,Vn) I v[,v '2 G {0,1}, v[ <Vi, v'^ < V 2 } ■ 

We extend the operation from trivectors v to sets of trivectors V in the canonical 
way. We are now able to define the ‘focussed’ abstract post operator post^^ ^ 2 ] 

as follows (refining the operator post^^ given in the previous section). 

PostLv.[i, 2 ](^) = (post^c(w) I « e focus[l,2](y)} 

Continuing Example 3, we have that post^^ ^ 2 ] ({(*? *)}) = {(O 5 0)^ (1^ I)}? 

which means that the operator post does not lose precision under the ‘focussed’ 
abstraction (i.e., the meaning function composed with post^^ ^ 2 ] equals post 

composed with the meaning function). Note that in general, the focus op- 
eration may yield trivectors with components *. Continuing Example 3 for 
the general version of the program with n > 2 Boolean variables, we have 

({(*5*’*’***5*)}) {(0505*5**-5*)5(l5l5*5'''?*)}- 

The definitions above generalize directly to focus operations in other than 
the first two and more than two components. The following observation follows 
directly from Proposition 2. 

Proposition 3. For every deterministie operator post, there exists a foeus op- 
eration sueh that post does not lose preeision under the Joeussed^ Cartesian 
abstraetion. 

The abstract post operator post^^^ used in SLAM results from combining the 
three refinements presented in Sections 9.1, 9.2 and 9.3, with the total focus 
operation focus [1, 2, . . . , n]. I.e., for each each control point i in the program, we 
have: [^]- 

By Proposition 3, for every i such that post^^J^] is deterministic, the ab- 
straction to post^g^[£] does not lose precision. A symbolic model checker such 
as bebop can realize the disjunctive completion and the total focus operation 
by representing and manipulating a set of trivectors V always in its ‘focussed’ 
version, i.e. the set of bitvectors focus[l, 2, . . . ,n](E). In a symbolic representa- 
tion, the gain of precision obtained by using the disjunctive completion and the 
total focus operation comes at no cost. More precisely, the two Boolean formu- 
las representing V and focus[l, 2, . . . , n]{V) simplify to the same form (e.g., true 
represents {(*,...,*)} as well as {0, l}’^). 
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10 Conclusion 

Abstraction is probably the single most important issue in model checking soft- 
ware. Our work goes beyond the standard abstraction used in model checking, 
the so-called Boolean abstraction. We use the abstract domain of trivectors (with 
a third truth value *) in order to define a new abstraction function a\^.^ in terms 

j± 

of Boolean and Cartesian abstraction, and an abstract post operator post^^ 
in terms of a Galois connection. We present a practically feasible algorithm to 
compute the new abstraction, represented as a Boolean program. Previous al- 
gorithms on related Boolean abstractions were practical only when restricted to 
a small subset of states; that restriction is not possible in our setting, which 
addresses programs with recursive procedures. 

We have implemented both the tools c2bp and bebop. We have used c2bp 
and bebop to successfully check properties of a Windows NT device driver for 
the serial port. The driver has a few thousand lines of C code. More details and 
a case study on using SLAM tools to check properties of Windows NT device 
drivers will appear in a forthcoming paper. 

The new abstraction trades a crucial gain of efficiency with a loss of precision 
(by ignoring dependencies between the Boolean variables). We single out the 
different causes of a proper loss of precision and are able to eliminate all but 
one. It may be interesting to determine general conditions ensuring that no 
proper loss of precision can ever occur, phrased e.g. in terms of separability [18]. 

The formal machinery developed here has potentially other applications in 
designing new abstractions for model checking software, in explaining existing 
approaches to pointer analysis based on 3-valued logic [20], and in classifying 
data-flow analysis problems modeled as model checking problems [23,21]. Previ- 
ous work relating the Boolean abstraction to bisimulation and temporal proper- 
ties (e.g. [5,10,6,16]) should be re-examined in the light of the new abstraction, 
perhaps in terms of 3- valued transition systems [14]. 

Acknowledgements. We thank Todd Millstein and Rupak Majumdar for their 
work on c2bp, and Bertrand Jeannet and Laurent Mauborgne for their helpful 
comments. 
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Abstract. Despite recent advances in model checking and in adapting 
model checking techniques to software, the state explosion problem re- 
mains a major hurdle in applying model checking to software. Recent 
work in automated program abstraction has shown promise as a means 
of scaling model checking to larger systems. Most common abstraction 
techniques compute an upper approximation of the original program. 
Thus, when a specihcation is found true for the abstracted program, it is 
known to be true for the original program. Finding a specihcation to be 
false, however, is inconclusive since the specihcation may be violated on 
a behavior in the abstracted program which is not present in the orig- 
inal program. We have extended an explicit- state model checker, Java 
PathFinder (JPF), to analyze counter-examples in the presence of ab- 
stractions. We enhanced JPF to search for “feasible” (i.e. nondeterminism- 
free) counter-examples “on-the-hy” , during model checking. Alternatively, 
an abstract counter-example can be used to guide the simulation of 
the concrete computation and thereby check feasibility of the counter- 
example. We demonstrate the ehectiveness of these techniques on counter- 
examples from checks of several multi- threaded Java programs. 



1 Introduction 

In the past decade, model checking has matnred into an effective techniqne for 
reasoning abont realistic components of hardware systems and communication 
protocols. The past several years have witnessed a series of efforts aimed at apply- 
ing model checking techniques to reason about software implementations (e.g., 
Java source code [8,12,24]). While the conceptual basis for applying model check- 
ing to software is reasonably well-understood, there are still unsettled questions 
about whether effective tool support can be constructed that allows for realistic 
software requirements to be checked of realistic software descriptions in a prac- 
tical amount of time. Most researchers in model checking believe that property- 
preserving abstraction of the state-space will be necessary to make checking 
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Modular Avionics Software Cooperative Agreement, NCC- 1-399, sponsored by Hon- 
eywell Technology Center and NASA Langley Research Center. 

T. Margaria and W. Yi (Eds.): TACAS 2001, LNCS 2031, pp. 284-298, 2001. 
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of realistic systems practical (e.g., [6,11,19])- There are a variety of challenges 
in bringing this belief to reality. This paper addresses one of those challenges, 
namely, the problem of antomating the analysis of connter-examples that have 
been prodnced from abstract model checks in order to determine whether they 
represent real system defects. 

The work described in this paper involves the integration of two recently 
developed tools for model checking Java sonrce code : Bandera [8] and Java 
PathFinder [24]. Bandera is a toolset that provides antomated snpport for re- 
dncing a program’s state space throngh the application of program slicing and 
the compilation of abstract definitions of program data types. The resnlting 
rednced Java program is then fed to JPF which performs an optimized explicit- 
state model check for program properties (e.g., assertion violations or deadlock). 
If the search is free of violations then the program properties are verified. If a 
violation is fonnd the sitnation is less clear. Bandera nses abstractions that pre- 
serve the ability to prove all paths properties (e.g., snch as assertions or linear 
temporal logic formnlae). To achieve state space redaction, however, the ability 
to disprove such properties is sacrificed. This means that a check of an abstracted 
system may fail either because the program has an error or because the abstrac- 
tions introduce spurious executions into the program that violate the property. 
The former are of interest to a user, while the latter are a distraction to the user, 
especially if spurious results occur in large numbers. 

Several approaches have been proposed recently for analyzing the feasibility 
of counter-examples of abstracted transition-system models [5,3,4]. While our 
work shares much in common with these approaches, it is distinguished from 
them in four ways: (/) it treats the abstraction a program’s data, as well as the 
run-time system scheduler and the property to be checked, (n) the feasibility of a 
counter-example is judged against the semantics of a real programming language 
(i.e., Java), (///) we advocate multiple approaches for analyzing feasibility with 
different cost /precision profiles, and (/d;) our work is oriented toward detecting 
defects in the presence of abstraction. We will demonstrate the practical utility 
of an implementation of our approaches by applying them to the analysis of 
counter-examples for several real multi-threaded Java applications. 

Safe abstractions often result in program models where the information re- 
quired to decide conditionals is lost and hence nondeterministic choice needs to 
be used to encode such conditionals (i.e., to account for both true and false re- 
sults). Nondeterministic choice is also used to model the possible decisions that a 
thread (or process) scheduler would make. Such abstractions are safe for all paths 
properties since they are guaranteed to include all behaviors of the unabstracted 
system. The difficulty lies in the fact that they may introduce many behaviors 
that are not possible. To sharpen the precision of the abstract model (by elimi- 
nating some spurious behaviors) one minimizes the use of nondeterminism and 
it can be shown that the absence of nondeterminism equates to feasibility [23]. 
Section 3 describes how program data, the property and scheduler behavior are 
abstracted in Bandera/JPF using nondeterminism. 
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JPF can perform a state-space search that is bounded by nondeterministic- 
choice operations; a property violation that lies within this space has a counter- 
example that is free of nondeterminism and is hence feasible. JPF can also per- 
form simulation of the concrete program guided by an abstract counter-example. 
If a corresponding concrete program trace exists then the counter-example is fea- 
sible. Section 4 describes these two techniques for analyzing program counter- 
examples that were added to JPF. Section 5 describes several defective Java 
applications whose counter-examples were analyzed using these techniques. In 
Section 6 we discuss related and future work and we conclude in Section 7. In 
the next section, we give some brief background on Bandera and JPF. 

2 Background 

Bandera [8] is an integrated collection of program analysis and transformation 
components that allows users to selectively analyze program properties and to 
tailor the analysis to that property so as to minimize analysis time. Bandera 
exploits existing model checkers, such as Spin [16], SMV [20], and JPF [24], to 
provide state-of-the-art analysis engines for checking program-property corre- 
spondence. Bandera provides support for reducing a program’s state-space via 
program slicing [15] and data abstraction. 

Data abstraction automates the reduction in size of the data domains over 
which program variables range [13]. A type inference algorithm is applied to 
ensure that a consistent set of abstractions are applied to program data. This 
type-based approach to abstraction is complementary to predicate abstraction 
approaches that reduce a program by preserving the ability to decide specific 
user-define predicates; JPF’s companion tool implements predicate abstraction 
programs written in Java [25]. 

Java PathFinder is a model checker for Java programs that can check any 
Java program, since it is built on top of a custom made Java Virtual Machine 
(JVM), for deadlock and violations of user-defined assertions [24]. In JPF special 
attention is paid to reducing the number of states, rather than execution speed 
as is typical of commercial JVMs, since this is the major efficiency concern in 
explicit-state model checking. Users have the ability to set the granularity of 
atomic steps during model checking to: byte-codes, source lines (the default) 
or explicit atomic blocks (through calls to beginAtomicO and endAtomic( ) 
methods from a special class called Verify). A JPF counter-example indicates 
how to execute code from the initial state of the program to reach the error. Each 
step in the execution contains the name of the class the code is from, the file 
the source code is stored in, the line number of the source file that is currently 
being executed and the a number identifying the thread that is executing. Using 
only thread numbers in each step JPF can simulate the erroneous execution. 

3 Program Abstraction 

Given a concrete program and a property, the strategy of verification by using 
abstraction involves: (i) defining an abstraction mapping that is appropriate for 




Finding Feasible Counter-examples when Model Checking 



287 



the property being verified and nsing it to transform the concrete program into 
an abstract program, (n) transforming the property into an abstract property, 
(in) verifying that the abstract program satisfies the abstract property, and 
finally (iv) inferring that the concrete program satisfies the concrete property. 
In this section, we snmmarize fonndational issnes that nnderlie these steps. 

3.1 Data Abstraction 

The abstract interpretation (AI) [9] framework as described in a large body of 
literatnre establishes a rigorons semantics-based methodology for constrncting 
abstractions so that they are safe in the sense that they over-approximate the 
set of trne execntable behaviors of the system (i.e., each execntable behavior 
is covered by an abstract execntion). Thns, when these abstract behaviors are 
exhaustively compared to a specification and found to be in conformance, we can 
be sure that the true executable system behaviors conform to the specification. 

We present an AI, in an informal manner, as: a domain of abstract values, an 
abstraction function mapping concrete program values to abstract values, and 
a collection of abstract primitive operations (one for each concrete operation in 
the program). For example, to abstract from everything but the fact that integer 
variable x is zero or not one could use the signs AI [1] which only keeps track 
of whether an integer value is negative, equal to zero, or positive. The abstract 
domain is the set of tokens {neg, zero^pos}. The abstraction function maps neg- 
ative numbers to neg, 0 to zero, and positive numbers to pos. Abstract versions 
of each of the basic operations on integers are used that respect the abstract 
domain values. For example, an abstract version of the addition operation for 
signs is: 



T abs 


zero 


pos 


neg 


zero 


zero 


pos 


neg 


pos 


pos 


pos 


{zero, pos, neg} 


neg 


neg 


{zero, pos, neg} 


neg 



Abstract operations are allowed to return sets of values to model lack of knowl- 
edge about specific abstract values. This imprecision is interpreted in the model 
checker as a nondeterministic choice over the values in the set. Such cases are 
a source of “extra behaviors” introduced in the abstract model due to its over- 
approximation of the set of behaviors of the original system. 



3.2 Property Abstraction 

When abstracting properties, Bandera uses an approach similar to [17]. Infor- 
mally, given an AI for a variable x (e.g. signs) that appears in a proposition 
(e.g.,x>0), we convert the proposition to a disjunction of propositions of the 
form x==a, where a are the abstract values that correspond to values that imply 
the truth of the original proposition (e.g., x==pos implies x>0, but x==neg and 
x==zero do not; it follows that proposition x>0 is abstracted to x==pos). Thus, 
this disjunction under-approximates the truth of a concrete proposition insuring 
that the property holds on the original program if the abstracted property holds 
on the abstract program. 
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public class Signs { 
public static final int NEG =0 
public static final int ZER0=1 
public static final int POS =2 
public static int abs(int n){ 
if (n < 0) return NEG; 
if (n == 0) return ZERO; 
if (n > 0) return POS; 

} 



public static int add(int a, int b){ 
int r ; 

Verify .beginAtomic 0 ; 
if(a==NEG && b==NEG) r=NEG ; 
else if(a==NEG && b==ZER0) r=NEG ; 
else if(a==ZER0 && b==NEG) r=NEG ; 
else if (a==ZER0 && b==ZER0) r=ZER0 ; 
else if(a==ZER0 && b==P0S) r=P0S; 
else if(a==P0S && b==ZER0) r=P0S ; 
else if(a==P0S && b==P0S) r=P0S; 
else r=Verify . choose (2) ; 

Verify . endAtomic 0 ; return r; }} 



Fig. 1. Java Representation of signs AI (excerpts) 



3.3 Scheduler Abstraction 

Analyzing concnrrent systems reqnires safe modeling of the possible schednling 
decisions that are made in execnting individnal threads. Since software is of- 
ten ported to operating system’s with different schednling policies, a property 
checked nnder a specific policy wonld be potentially invalid when that system is 
execnted nnder a different policy. To address this, the approach taken in existing 
model checkers is to implement what amonnts to the most general schednling 
policy (i.e., nondeterministic choice among the set of rnnnable threads). Proper- 
ties verified nnder snch a policy will also hold nnder any more restrictive policy. 
Fairness constraints are snpported in most model checkers to provide the ability 
to more accnrately model realistic schednling policies. 

The Java langnage has a relatively weak specification for its thread schednling 
policy. Threads are assigned priorities and a schednler mnst ensnre that “all 
threads with the top priority will eventnally rnn” [2]. Thns, a model checker that 
gnarantees progress to all rnnnable threads of the highest priority will prodnce 
only feasible schednles; JPF implements this policy. 



3.4 Abstraction Implementation 

In Bandera, generating an abstract program involves the following steps: the nser 
selects a set of AIs for a program’s data components, then type inference is nsed 
to calcnlate the abstractions for the remaining program data, then the Java 
class that implements each AI’s abstraction fnnction and abstract operations 
is retrieved from Bandera’s abstraction library, and finally the concrete Java 
program is traversed, and concrete literals and operations are replaced with calls 
to classes that implement the corresponding abstract literals and operations. 

Fignre 1 shows excerpts of the Java representation of the signs AI. Abstract 
tokens are implemented as integer valnes, and the abstraction fnnction and op- 
erations have straightforward implementations as Java methods. For Java base- 
types, the definitions of abstract operations are antomatically generated nsing 
a theorem prover (see [13] for details). Nondeterministic choice is specified by 
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Fig. 2. Model Checking on Choose-free Paths 



calls to Verify, choose (n), which JPF traps dnring model checking and retnrns 
nondeterministic valnes between 0 and n inclnsive. Abstract operations execnte 
atomically (via calls to Verify . beginAtomic ( ) and Verify . endAtomic ( ) ) since 
they abstract concrete byte-codes (e.g. Signs. add() abstracts iadd). 

4 Finding Feasible Counter-examples 

We have seen in the previons section that, if a specification is trne for the ab- 
stracted program, it will also be trne for the concrete program. However, if the 
specification is false for the abstracted program, the connter-example may be the 
resnlt of some behavior in the abstracted program which is not present in the 
original program. It takes deep insight to decide if an abstract connter-example 
is feasible (i.e. corresponds to a concrete compntation). We have developed two 
techniqnes that antomate tests for connter-example feasibility: model checking 
on choose-free paths and abstract connter-example gnided concrete simnlation. 



4.1 Choose-Pree State Space Search 

We enhanced the JPF model checker with an option to look only at paths that do 
not refer to instructions that introduce nondeterminism (i.e. a Verify, choose () 
call). When a choose occurs the search algorithm of the model checker backtracks. 
The approach exploits the following theorem from [23]: Theorem. 

Every path in the abstracted program where all assignments are deter- 
ministic IS a path in the concrete program. 

In [23], the theorem is used to judge a counter-example feasible, whereas we 
use it to bias the model checker to search for feasible counter-examples. The 
theorem ensures that paths that are free of nondeterminism correspond to paths 
in the concrete program (a more general definition of deterministic paths can be 
found in [10]). It follows that if a counter-example is reported in a choose-free 
search then it represents a feasible execution. If this execution also violates the 
property, then it represents a feasible counter-example. 

Consider an abstracted program (whose state space is sketched in Figure 2). 
Black circles represent states where some assertion is violated. Dashed lines 




290 



Corina S. Pasareanu, Matthew B. Dwyer, and Willem Visser 





class App{ 

public static void main(.. 


..){ 


class App{ 

public static void main( . . . ) { 


[1] 


new AThreadO . start 0 ; .. 




new AThreadO . start () ; ... 


[2] 


int i=0 ; 




int i=S igns . ZERO ; 


[3] 


while (i<2) { . . . 




while (Signs . It (i , Signs .POS) ) { 


[4] 


assert ( ! Global . done ) ; 




assert ( ! Global . done) ; 


[5] 


i++; 




i=Signs . add(i , Signs . POS) ; 




}}} 

class AThread extends Threadf 


}}} 

class AThread extends Threadf 


[6] 


public void run(){ . . . 
Global . done=true ; 




public void run(){ . . . 
Global . done=true ; 



}} I }} 



Fig. 3. Simple Example of Concrete (left) and Abstracted (right) Code 

represent transitions that refer to choose, while solid lines refer to instrnctions 
other than choose. Model checking on choose-free paths will report only the error 
path 1-3-6, althongh path 1-2-4 leads to a state where the assertion is false (and 
it may correspond to an execntion in the concrete program). 

We also note that onr techniqne conld be implemented in any model checker, 
bnt the design of JPF made this modification particnlarly easy. JPF is essentially 
a special-pnrpose JVM that interprets each byte code in the compiled version of 
a Java program. Since choose operations are represented as static method calls, 
trapping and processing those operations specially only reqnired modification 
of the code for the static method invocation byte-code. We made snre that the 
search on choose-free paths does not introdnce deadlocks (choose instrnctions 
are interpreted as infinite self loops). 

Consider checking the fragment of code on the left of Fignre 3 against the 
assertion at line 4, where initially Global. done is false; the abstracted code 
(nsing signs for i) is shown to the right of the original. In the abstracted pro- 
gram, nondeterminism is introdnced throngh method It that implements the 
abstract operation for <: after one pass throngh the while loop, the abstract 
value of i becomes pos and the value returned by Signs . It (i , Signs . POS) can 
be either true or false. However, the abstract program does expose a choose-free 
counter-example: if the thread that is an instance of AThread executes line 6 
before the main thread begins the execution of the while loop, the assertion in 
line 4 is violated when the body of the loop is executed for the first time (and 
the abstract value of i is zero). This counter-example does not contain nonde- 
terministic choices, since the value returned by Signs . It (i , Signs . POS) , when 
i is zero, is uniquely true. 

4.2 Abstract Counter-example Guided Concrete Simulation 

In Bandera, the generation of an abstracted program is automatic and is done 
in such a way that there is a clear correspondence between the concrete and 
abstracted program: for each line in the concrete program, there is a single line in 
the abstracted program. Since byte-codes execute atomically, for each “concrete” 
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Program & Property 




Fig. 4. Model Checking and Refinement 



byte-code, there is a set of “abstract” byte-codes that execnte atomically in 
JPF. This property of Bandera abstraction, together with the fact that all Java 
variables have known initial valnes, allows for simnlation of the concrete program, 
based on an abstract counter-example. 

This is done in JPF by executing the steps in the abstract trace. For clarity, 
wedl discuss the simulation in terms of the execution of lines of Java source code, 
but JPF can also perform simulation on a byte-code level. Each step contains 
information about the thread to be run next and the line of the counter-example. 
At each step of the concrete execution, JPF checks that the concrete line to be 
executed corresponds to the abstract line in the counter-example. If the lines 
match throughout the simulation then the abstract trace is feasible, otherwise, 
the abstract trace is spurious. To check whether the feasible trace is a counter- 
example, we have also to check if it violates the property. 

Consider again the example from Figure 3 where the result of model check- 
ing the abstracted program is a counter-example where Global . done is set true 
after the loop in the main thread is executed two times. This means that the 
assertion is reachable (and violated) by the (abstract) trace 
1-2-3-4-5-3-4-5-3-4 

in the main thread. While this is clearly possible in the abstract program (since, 
after the abstract value of i becomes pos^ the condition at line 3 can be non- 
deterministically true or false), it is not possible in the concrete program. To 
see this, we simulate the steps from the abstract trace on the concrete program: 
after executing the loop two times, the value of i is 2 so the exit condition of 
the loop is true and the loop is exited. At this point a line mismatch is detected 
and the simulation stops. 

It is possible to detect the infeasibility of an abstract trace earlier, using a 
technique similar to forward analysis (e.g.[22]): when simulating each step on 
the concrete program, we also check the correspondence between concrete and 
abstract values. This can be done in JPF by abstracting the values of variables 
(e.g., via calls to Signs. abs()) in the concrete simulation and comparing them 
to the abstract values in the counter-example. 
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[1] x=l; 

[2] y=x+l; 

[3] assert (x<y) 



x=Signs . POS ; 

y=Signs . add (x, Signs . POS) ; 

assert ( (x==Signs . NEG && y==Signs . ZERO) 

I I (x==Signs . NEG && y==Signs . POS) 

I I (x==Signs . ZERO && y==Signs .POS) ) ; 



Fig. 5. Example of Spurious Error Introduced by Property Abstraction 



4.3 Methodology 

Our methodology for model checking and abstraction involves the integration of 
the above two techniques as illustrated in Eigure 4. The input (concrete) program 
and the specification are abstracted (using abstractions from Bandera’s library) 
as described in Section 2 and the transformed program is fed to a model checker. 
If the result of model checking is true, then the specification is true for the 
concrete program. If the result is false, we re-run the model checker to search 
only choose-free paths in the model. If the model checker finds a choose-free 
counter-example, it is reported to the user otherwise we perform counter-example 
guided simulation. If the simulation succeeds, a counter-example is reported, but 
if a mismatch is detected then abstractions need to be refined. The refinement 
involves modifying the selection of abstractions guided by the counter-example 
reported in the first run of the model checker. Eor a discussion on how the 
abstractions could be refined, see Section 6. 



4.4 Discussion 

In general, the result of model checking an abstract program is false either be- 
cause the concrete program does not satisfy the property (in which case the 
counter-example is feasible and indicates a real defect), or because the abstrac- 
tion is not suitable for checking the property. In the latter case, the abstract 
counter-example can be one of the following: 

— not feasible as a result of over-approximation of the behavior of the concrete 
program (e.g. the spurious counter-example of the program in Eigure 3); 

— feasible but not defective; as a result of the under- approximation of the 
property to be checked. This case is illustrated by the code in Eigure 5, 
where both x and y are abstracted to signs. The predicate in the assertion 
is abstracted in such a way that if the assertion is true in the abstracted 
program, it follows that it is true in the concrete program. Abstract trace 
1-2-3 violates the assertion, since after step 2, both x and y are pos. However, 
in the concrete program, the assertion is true. 

In our experience this second case is rare, since in Bandera user’s are guided to 
make abstraction selections that are able to decide both the truth and falsity of 
the propositions used in the property to be checked. Only when such a selection 
is impossible can a feasible, but not defective, counter-example arise. 

We note that both choose-free model checking and abstract counter-example 
guided concrete simulation can be directly applied to a executable program slice. 
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If a trace is feasible in the sliced program, it is also feasible in the original 
program [15]. We also note that the techniqnes presented here can be applied 
for checking safety properties expressed in any nniversal temporal logic. 

5 Experience with Defective Java Applications 

To illnstrate the potential benefits of the techniqnes described in the previons 
section, we applied them to several small to medinm-size mnlti-threaded Java 
applications. These applications nsed both lock synchronization and condition- 
based synchronization (i.e., wait/notify). The systems are: RAX (Remote 
Agent experiment) [25], a Java version of a component extracted from an em- 
bedded spacecraft-control application. Pipeline [7], a generic framework for 
implementing mnlti-threaded staged calcnlations, RWVSN, Lea’s [18] generic 
readers-writers synchronization framework, and DEOS [21,25], the scheduler 
from a real-time executive for avionics systems that was translated from C-h+. 
The following table gives some basic measures of the size of the system; SLOC 
stands for the number of source lines of code. 



Program 


SLOC 


Classes 


Methods 


Fields 


Threads 


RAX 


55 


4 


8 


7 


3 


Pipeline 


103 


5 


10 


7 


5 


RWVSN 


590 


5 


43 


10 


5 


DEOS 


1443 


20 


91 


92 


6 



Most of these programs use the basic features of Java and its concurrency con- 
structs, however, the RWVSN application uses abstract classes, inheritance, 

and java. util . Vector. 

The RAX and DEOS examples had known errors that we checked for. For 
the Pipeline and RWVSN examples we seeded faults in the program. For 
example, we dropped a negation (!) in one program and changed <= into < 
(simulating an off-by-one error) in the other. It is interesting to note that not 
all seeded faults could be detected given the properties we checked for, so we 
altered the faults until we generated a property violation. 

We now describe several model checks for these systems and the automated 
analysis of the resulting counter-examples. Full details for the examples and 
model checks is available at | http://www.cis.ksu.edu/ pcorina/case-studies — . 

5.1 Description of Experiments 

We model checked the RAX example to detect deadlocks using two different 
abstractions. Figure 6 shows excerpts from the original and the generated ab- 
stract Java program. The abstraction of the program was driven by our se- 
lection that the Event . count field should be abstracted with signs. Bandera’s 
abstraction type inference determined that the local count variables in the 
FirstTask . run( ) method should also be abstracted. Running JPF on this ab- 
stracted system detects a deadlock and produces a 74 step counter-example. 
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[ 1] class Event -[ 

[ 2] int count=0; 

[ 3] public synchronized void wait_f or_event () -C 
[ 4] trytwait C ) ; > 

[ 5] catchClnterruptedException e){>; 

> 

[ 6] public synchronized void signal_event () -C 
[ 7] count = count + 1; 

[ 8] notifyAll () ; 

» 

[ 9] class FirstTask extends Threadt 

[10] Event eventl ,event2 ; 

[11] int count=0; 

[12] public void run(){ 

[13] count = eventl . count ; 

[14] while (true) { 

[15] if (count == eventl . count ) 

[16] eventl . wait _for_event C) ; 

[17] count = eventl . count ; 

[18] event2. signal.event C) ; 

»> 



class Event -[ 
int count = Signs. ZERO; 

public synchronized void wait_f or_event () { 
try {wait ();}■ 

catchClnterruptedException e){>; 

> 

public synchronized void signal_event () { 
count = Signs . add (count , Signs . POS) ; 
not ifyAll C ) ; 

» 

class FirstTask extends Thread { 

Event eventl , event 2 ; 
int count = Signs. ZERO; 
public void run () { 
count = eventl . count ; 
while (true){ 

if (Signs . eq (count , event 1 . count ) ) 
eventl . wait_f or .event ( ) ; 
count = eventl . count ; 
event2.signal_eventC) ; 

»> 



Fig. 6. RAX Program with Deadlock (excerpts) 



Analysis of this counter-example reveals that it is spurious. After 39 steps in the 
counter-example the trace reaches the conditional at line 15 . In the real system, 
the branch condition is false, but due to the nondeterminism of Signs. eq() 
for positive parameters the abstract system enters the conditional. JPF is able 
to find a 40 step choose-free counter-example. It is clear that the presence of 
spurious counter-examples is closely related to the property being checked, the 
program and the abstraction’s selected. We reran our model checks changing 
the abstraction for Event . count field to record information about the evenness 
or oddness of its values. This produced a 128 step counter-example, but JPF 
was unable to find a choose-free counter-example. At this point, we ran JPF in 
simulation mode guided by the 128 step counter-example and while this counter- 
example did contain nondeterministic choices it was shown to be feasible. 

The Pipeline example consists of an application that uses the methods of 
a Pipeline class to manage execution of a multi-threaded staged computation. 
The application constructs and starts execution of a pipeline, calls stopO to end 
execution of the pipeline, and calls add() to provide input to the computation. 
We model checked a precedence property for the Pipeline system stating that 
“no pipeline stage (i.e., thread) will terminate until the stop method is called”. 
Since JPF does not currently support checking of temporal properties, we en- 
coded this using a boolean variable, stopCalled, set to true when the stopO 
method had been called and embedded assert (stopCalled) at the return point 
of the stage run methods. This example was abstracted by identifying a loop 
index variable that controlled the number of times the add() method was called 
and abstracting it to signs. Type inference determined that 5 additional fields 
and local variables also needed abstraction. Checking the property on the ab- 
stracted system detected an error on a 168 step counter-example. JPF found a 
69 step choose-free counter-example that is similar to the example in Figure 3 
in that it occurred on the first iteration of an abstracted loop. 
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RWVSN consists of an application that extends Lea’s RWVSN class [18] to 
implement an object with a readers- writers synchronization policy. That object 
is then shared by several threads that read and write throngh the RWVSN inter- 
face. We checked that access by a reader exclnded access by a writer by setting 
a boolean variable, in_writer, in the writer’s critical section and resetting it 
npon exit, and embedding assert ( ! in_writer) in the reader’s critical section. 
Abstraction was applied to 3 integer fields of the RWVSN class abstracting them 
to signs. Checking the property on the abstracted system detected an error in 
179 steps. JPF fonnd a 76 step choose-free connter-example. 

The DEOS system has been the snbject of several recent case stndies in 
model checking code [21,25,13]; we performed the abstraction and analysis as 
described in [13]. The property being checked is an assertion that encodes a test 
for time partitioning in the schednler component of the system. We nsed depen- 
dence analysis driven by the location of the assert statement and the data values 
it referenced to identify a single field (out of 92) as influencing the property. We 
selected the signs AI for that field and type inference determined that 2 more 
fields should be abstracted. Checking the property on the abstracted system de- 
tected an error in 471 steps. JPF found a 312 step choose-free counter-example. 

5.2 Discussion 

While these programs represent a range of different patterns of concurrency (e.g., 
clients and server, pipelines, and peer-groups) and the larger examples are real 
applications, we do not claim that our results generalize to a broader class of 
multi-threaded Java programs. We do, however, believe the results suggest that 
the counter-example analysis techniques we have developed have merit and can 
significantly reduce the burden users face when analyzing counter-examples from 
checks of abstracted systems. 

The data clearly show that counter-examples can be reduced significantly in 
length; this alone makes it easier to diagnose the program fault. The fact that 
counter-examples are guaranteed to be feasible helps focus the user’s attention 
on only those counter-examples for which analysis will lead to fault detection. 

It should come as no surprise that a choose-free model check is faster than 
a typical model check since it is essentially a depth-bounded model check. Most 
model checkers can do depth-bounded search and in fact this often allows for 
detection of significantly shorter counter-examples. The key difference lies in the 
fact that a choose-free search uses an adaptive depth-bound that is based on 
encountering nondeterministic choice operators. This guarantee of not executing 
a choice operator is what assures counter-example feasibility. Without that a 
naive depth-bounded search may include execution of a choice operator. 

Finally, we observe that choose-free search can be an effective way to exploit 
more aggressive abstraction approaches. The application of source-level predi- 
cate abstraction techniques to the DEOS and RAX is described in detail in 
[25]. In that work a predicate abstraction and an invariant for DEOS and 4 
different predicate abstractions for RAX were used to produce abstract models 
that preserved both truth and falsity of the properties being checked. In contrast. 
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the checks described in this paper sacrifice precision for more aggressive abstrac- 
tion, and state-space redaction, while choose-free search enables the recovery of 
feasible connter-examples. 



6 Related and Future Work 

In onr previons work [13], we focnsed on the specification, generation, selection 
and compilation of abstractions for Java programs. In this paper, we detail tech- 
niqnes for analyzing connter-examples and provide evidence for their nsefnlness 
on several non-trivial Java programs. 

Most existing work on connter-example analysis is oriented towards the goal 
of verification; connter-example analysis drives abstraction refinement for the 
pnrpose of proving a property. In contrast, onr work is oriented toward defect 
detection. Onr biasing of the model checker yields a complete coverage of the 
snb-space of gnaranteed feasible paths in the system rather than simply assessing 
the feasibility of a single connter-example from an nnbiased model check. 

Onr simnlation techniqne works becanse JPF maintains a correspondence 
between the concrete and abstracted programs and Java defines defanlt initial 
valnes for all data (thns a program has a single initial state). It is possible to 
develop more general simnlation techniqnes that handle mnltiple initial states, 
bnt we believe these are not necessary for Java. One snch techniqne [5] nses 
forward analysis and performs a symbolic simnlation of the concrete system nsing 
predicates that characterize the program data valnes. Since it does not keep a 
correspondence between concrete and abstract transitions, rather than determine 
the next concrete state it mnst compnte (at each step of the simnlation) the set 
of all possible next concrete states. This method, which is implemented in SMV, 
is limited to finite-state systems. 

In SLAM [3], seqnential C programs are abstracted into boolean programs; 
symbolic execntion is nsed to map abstract connter-examples to concrete execn- 
tions. INVEST [4] and interactive abstractions [22] nse theorem proving to rnle 
ont spnrions connter-examples. Backward analysis is nsed to obtain information 
to refine the abstractions. Unlike onr approach, these tools/techniqnes are not 
concerned with property abstraction or schednling information. 

We believe that the methods described in these papers are complementary to 
onr techniqnes. For example, we can nse backward analysis to obtain feedback 
for refinement of abstractions. Backward analysis compntes pre-images of the 
violating abstract state over the given trace. For the spnrions connter-example 
of Fignre 3, after the body of the loop is execnted two times, the valne of the 
loop condition is trne, which means that the concrete valne of x is believed to be 
less than 2. The analysis wonld discover that this happens becanse the valne of 
X before the assignment at line 5 is believed to be less than 1 (which is not trne 
in the concrete program, where the valne of x is exactly 1). This implies that a 
new abstraction to be selected for variable x has to inclnde a new token for 1 
(e.g. signs abstraction shonld be replaced with range(O.A) abstraction [13]). 
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We note that both choose-free search and counter-example guided simulation 
techniques could be implemented in any explicit-state model checker. For exam- 
ple, Bandera [8] generates Promela models for Spin that can easily be adapted to 
perform choose-free search. Path simulation simply requires the ability to asso- 
ciate the steps of the concrete and abstract program and to simulate the concrete 
program. One can already do this by hand using Spin’s simulation facilities, but 
automating the process would greatly ease its use. We also note that, although 
we set our presentation in the context of Bandera’s abstraction, other forms of 
data abstraction, like JPF’s predicate abstraction, would also be treated prop- 
erly. By that we mean that a path through the predicate abstracted code that 
is choose-free or that can be mapped to a concrete execution is feasible. 



7 Conclusion 

In this paper, we have suggested two approaches for analyzing counter-examples 
produced by model checks of abstracted programs. These approaches have the 
advantage of being very fast (i.e., choose-free search is depth-bounded and the 
cost of simulation is related to the length of the counter-example). Based on 
experimentation with an implementation of these techniques in a Java model 
checking tool we have also found the techniques to be capable of detecting guar- 
anteed feasible counter-examples in nearly every case. This enables aggressive 
abstractions to be applied without losing the ability to detect errors, thereby 
minimizing the need for refinement of abstractions. This implementation treats 
not only abstraction of program data, but also of thread scheduling policies, 
and the property to be checked. Finally, we believe that these techniques can 
be combined with other counter-example analysis methods to provide a suite of 
tools that vary cost and in their ability to precisely analyze counter-examples. 
Such a tool suite would be a useful addition to any model checking tool. 
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Abstract This paper describes the architecture of the LOOP tool, which 
is used for reasoning about sequential Java. The LOOP tool translates 
Java and JML (a specihcation language tailored to Java) classes into 
their semantics in higher order logic. It serves as a front-end to a the- 
orem prover in which the actual verihcation of the desired properties 
takes place. Also, the paper discusses issues related to logical theory 
generation. 



1 Introduction 

Being able to verify programs has always been a major topic in compnter science. 
For this pnrpose many artificial, mathematically clean, programming langnages 
have been introdnced, since reasoning abont real, dirty, programming langnages 
is far from easy. Dne to the progress in the field of theorem proving, and the 
increase in compnting power, it has become feasible now to reason about real 
programming languages. Also, specialised tools — like the LOOP tool — contribute 
to this feasibility. 

Using theorem provers for program verification becomes more and more com- 
mon. There are numerous advantages to the use of theorem provers for doing 
proofs over doing proofs by hand: theorem provers are very precise, they can do 
lots of, often boring, proof steps in a few seconds, they keep track of the list of 
proof obligations which are still open, and do a lot of bureaucratic administra- 
tion for the user. This is especially relevant in the area of program verification 
where usually many cases have to be distinguished and the proofs themselves 
are not so difficult (in comparison to mathematics). 

Since Java is one of the most popular programming languages around, it is 
also of particular interest for researchers. Many research groups are focusing on 
specification and verification of Java programs at source-code level, using various 
tools, e,g, 

— ESC/ Java [23] is an extended static checker for Java (including threads), 
which can detect certain runtime errors at compile time, by using a built-in 
theorem prover. By using this checker, many (but not all) errors can be found 
without user interaction. ESC/ Java uses a specification language which has 
recently been integrated with JML [15]. 
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— Jive [17] is a verification environment in which a nser can write Java sonrce- 
code as well as its specification. It is connected with a theorem prover, cnr- 
rently this is PVS [19], which is nsed to verify proof obligations. Jive’s nser 
interface takes care of the interaction with PVS. With Jive, one is cnrrently 
able to reason abont the seqnential kernel of Java, bnt not abont exceptions, 
a crncial part of Java. 

— In the Bali project a deep embedding of a semantics for Bali, a Java snbset, 
in Isabelle [20] has been developed, with various meta-theoretical results: 
formalisation of the type system to prove type-safety [18], soundness and 
completeness of an appropriate Hoare logic. This project is not primarily 
focussed on verification of concrete programs. 

— The KeY project [1] aims at integrating formal specification and verification 
tools into the software engineering process. Within this project a dynamic 
logic for JavaCard, Java’s subset for smart card programming, has been 
developed. The verification tool for this project is still under development. 

— The Bandera project [5] extracts a non-finite-state model from Java source- 
code, and applies program analysis, abstraction and transformation tech- 
niques to it, in order to get a finite-state model. This model is abstractly 
represented, enabling the generation of concrete models for various model 
checking tools. The tools developed in this project are applied to several 
Java case studies. 

The LOOP project [21] focuses on specification and verification of sequential 
Java. For this part of Java a formal semantics has been developed, based on 
coalgebras. JML is the language used to specify Java classes. For the kernel part 
of JML — invariants, behaviour specifications, including modifiable clauses — a 
formal semantics is being developed. 

Within the loop project a special purpose compiler, the LOOP tool, has 
been built which incorporate these semantics of Java and JML. The output 
of the LOOP tool is a series of logical theories for the theorem provers PVS and 
Isabelle. This gives the verifier a choice of proof tool. Typically, when a user wants 
to reason about a Java class, (s)he uses the LOOP tool for the translation, and 
reasons about the program in the language semantics using a theorem prover. 
The LOOP approach makes use of existing, general purpose theorem provers, 
and concentrates on building a dedicated front-end for a particular application 
area, because developing a (dedicated) theorem prover is a project on its own. 
Reasoning goes via a combination of applying semantic-based Hoare logic rules 
and automatic rewriting. Several papers about the underlying semantics and 
logic have already been published [14,3,9,10,12,13]. This paper focuses on the 
tool itself. 

Automatic translation of Java classes into a series of logical theories has 
several advantages above manual translation. The LOOP translation process is, 
boring, error-prone, and time consuming. A translated Java class is usually much 
larger in size than the original. A tool will do such a translation within a few 
seconds, without complaining, and without errors (if the translation function is 
implemented correctly). Another advantage is that with tool support the gen- 
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erated theories can be fine-tuned to achieve more efficiency in proofs, which is 
hardly possible when generating theories by hand. 

In comparison to the projects mentioned above there are the following dis- 
tinguishing features of the LOOP project. 

— The ESC/ Java tool involves no user interaction, is fast and easy to use, but 
can only detect a limited class of errors. With the LOOP tool the user has 
to engage in interactive program verification, using the back-end proof tool, 
but there are no inherent limitations to what can be (dis)proved. Thus, the 
ESC/ Java and LOOP tools are complementary and can very well be used 
in combination, especially because they use the same specification language 
(namely JML). 

— The Jive approach is closest to the LOOP approach. It differs however in 
its syntax-based approach, via a dedicated user interface, allowing reasoning 
about the actual program text (and not about its meaning). The specification 
language of the Jive tool resembles JML. It is too early to judge and compare 
these two approaches in actual examples. 

— The Bandera project aims at verification of Java programs (especially in- 
volving threads) using model checkers. Similar to the LOOP project, output 
is generated for back-end tools that do the actual verification. However, 
model checkers instead of theorem provers are used. A general problem with 
multi-threaded Java is that the level of granularity is not well-defined. 

— The Bali and KeY projects have not been used (yet) on substantial concrete 
examples of Java programs, making a comparison premature. 

This paper is organised as follows. Section 2 describes the modular archi- 
tecture of the LOOP tool. Section 3 describes some issues related to the theory 
generation. Section 4 briefly describes how to use the LOOP tool, and finally 
Section 5 gives an overview of possible application areas. 

2 The Architecture of the LOOP Tool 

As shown in Eigure 1, the loop tool accepts three languages with object-oriented 
features, namely CCSL, Java, and JML. It serves as a front-end for a theorem 
prover which, in this figure, is PVS. The loop tool can also serve as a front-end 
for Isabelle. The theorem prover is used to actually prove properties about the 
classes in the input languages, on the basis of the logical theories generated by 
the LOOP tool. 



2.1 Input Languages 

Historically, the first input language is CCSL [7,22], short for Coalgebraic Class 
Specification Language. It is an experimental specification language, which is 
jointly developed at the University of Dresden and the University of Nijmegen. 
With this language one can write class specifications in an object-oriented way. 
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Figure 1. Overview of the LOOP project 



Le. one can write specifications with attribntes, methods, and constrnctors. It 
also snpports inheritance and snbtyping. CCSL nses a coalgebraic semantics 
for classes and snpports tailor-made modal operators for reasoning abont class 
specifications. In this paper we concentrate on the inpnt langnages Java and 
JML, and refer to [7,22] for more information on CCSL. 

The second inpnt langnage is Java — one of the most popnlar object-oriented 
programming langnages. Onr semantics for seqnential Java, /.e. Java withont 
threads, closely follows the Java Langnage Specification (JLS) [6]. More infor- 
mation about this semantics can be found in [14,3,9,10,12]. 

The third input language is JML, short for Java Modeling Language. JML 
is a behavourial interface specification language, tailored to Java, and primarily 
developed at Iowa State University. It is designed to be easy to use for program- 
mers with limited experience in logic. Therefore, it extends Java such that a 
user can write (class) invariants, and pre- and post-conditions for methods and 
constructors within the source code, making use of Java expressions (extended 
with various logical operators) to formulate the desired properties. All extensions 
of JML are enclosed between Java’s comment markers, and will therefore not 
influence the program’s behaviour. A typical JML specification for a method m 
looks as follows. 

behavior 

@ requires : <precondit ion> 

@ modifiable : <fields> 

@ ensures : <postcondition> // when terminating normally 

@ signals : (E) <postcondit ion> // when terminating abruptly 

@ // because of exception E 



void m 0 { . . . } 
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2.2 LOOP Tool Internals 



In Figure 2 the view on the LOOP tool is enlarged. Here a view is considered 
where the tool accepts Java classes (and interfaces)^. The first three passes can 
be viewed as the first part of a standard Java compiler. 




Figure 2. The “exploded” view on the LOOP tool 



Standard techniques are used to build a lexer and parser, following the defi- 
nition of the Java syntax in the JLS. During parsing, unknown types — class and 
interface types — are not resolved. These types are stored as (tagged) strings in 
the abstract syntax tree, and resolved in a later pass. 

The inheritance resolver establishes relations between classes by resolving the 
unknown types. Also, in this pass overridden methods and hidden fields in Java 
are internally marked as overridden and hidden. 

The type checker computes the type of every expression occurring in the 
input classes. A type checker is needed, since the overloading mechanism of Java 
is more powerful than the ones of PVS and Isabelle. Therefore, definitions in 
PVS and Isabelle are often provided with explicit types. 

At this point a standard Java compiler would generate a bytecode file for 
each class. Instead, the LOOP tool translates each Java class into its semantics, 
in the form of a series of logical theories. These theories are produced internally 
in an abstract way using abstract logic syntax (ALS), see Subsection 3.3 below. 

Finally, to come to concrete theories, a last pass, a pretty printer, is im- 
plemented to translate the ALS into concrete logic syntax. Abstract theories 
provide a powerful technique to produce concrete theories for different theorem 



^ In this paper ‘Java class’ may also be read as ‘Java interface’. If not, it will explicitly 
be mentioned. 
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provers^. Implementing such a pass is fairly simple. We have implemented two 
of these, one for PVS and one for Isabelle. 

For JML the loop tool works similarly. Since JML is an extension of Java, the 
grammar of Java is extended, and for the logic of JML, that is based upon Java 
expressions, also the type checker is extended. The theories for the specifications 
are also abstractly generated. Notably, the pretty printer components of the 
LOOP tool are shared with the three input languages — CCSL, Java, and JML. 



2.3 Implementation Details 

The OCaml language [16] is used to implement the LOOP tool. It comprises a 
number of tools, such as lex and yacc, a debugger, and a (native-code) compiler. 
OCaml is an ML dialect, supporting object-oriented features. It is a strongly 
typed (functional) programming language, /.e. every expression has a type which 
is automatically determined by the compiler. One great advantage of using a 
strongly typed language is that many potential program errors are caught by the 
compiler as type errors. The penalty for this is that one has to set up appropri- 
ately structured types first. This forms a non-trivial part of the implementation 
of the LOOP tool. 

Internally, a Java/JML class and its members (fields, methods, and construc- 
tors) occurring in the input are stored as instances of certain OCaml classes. As 
root classes, we use two OCaml class types, top_if ace.type for CCSL/ Java/JML 
classes, and topjnember_type for CCSL/ Java/ JML members. The “top_” class 
types contain common information, such as the name of the class, and the fields 
and methods defined in it. For each input language we introduce specialised class 
types, to deal with language specific properties. 



top_if ace.type 




java.if ace.type ccsl.if ace.type 



jml_if ace.type 



Similarly, topjiiember.type has specialised class types for CCSL, Java, and 
JML members. Every “_if ace.type” class type is mutually recursive with its 
“jnember_type” variant. These types have a non-trivial structure, involving sub- 
typing and mutual recursion in various forms. 

^ The ALS involves standard constructions from higher order logic. Thus, it is in 
principle easy to generate output also for any theorem prover that provides (at 
least) higher order logic, e.g. COQ [2]. 
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Due to the object-oriented nature of the LOOP tool, it is easy to adapt the 
theory generation for the different input languages. Each “_if ace.type” class 
type has a method that invokes the theory generation, which is overridden in 
specialised types. 

Some non-technical details: the LOOP tool currently consists of over 58,000 
lines of OCaml code (including documentation) of which 25,000 lines are used 
to implement the Java part, and 8,000 lines are used to extend it to JML. To 
implement CCSL 12,000 lines are used, and 13,000 lines of code are shared. Work 
on the LOOP tool started in 1997, and continues until this moment. 

3 Generated Theories 

This section focuses on some typical issues and problems related to theory gen- 
eration. The contents of the theories themselves are too complicated to describe 
here in detail, and are not directly relevant. See [9,8] for more information. 



3.1 Mutually Recursive Classes and Circular Theories 

The LOOP tool translates each Java (and JML) class into its semantics in higher 
order logic as a series of logical theories. It is not possible to generate this se- 
mantics as one single theory, since at several places in the source-code references 
to other classes might occur. Having such references might lead, in that case, to 
circular theories, via importings. This is not allowed in PVS and Isabelle. Hence, 
they have to be disentangled. 

In Java source-code, references to other classes can occur at three places: 

1. at inheritance level, but this does not lead to circularities, since a standard 
Java compiler detects if a class is a subclass of itself, e,g, class A extends 
B and class B extends A is illegal; 

2. at interface level. The signatures of members of class A contain occurrences 
of class B, and vice versa; 

3. at implementation level. In a method (or constructor) body in class A the 
class B occurs, e,g, via creating an object of class B or a field access of an 
object of type B, and vice versa. 

For a concrete (toy) example of mutual recursion between Java classes, con- 
sider classes A and B in Figure 3, where the signature of method m in A has an 
occurrence of class B, and the signatures of both methods in B have occurrences 
of class A. Moreover, method m in A creates an object of class B, and method n 
assigns a value to a field of b (cast to A) . 

To prevent the generated theories from being circular, the semantics of each 
Java class is divided into three^ tailor-made theories: 

^ Actually, the semantics is spread over eight theories, but due to space restrictions 
only the theories generated to handle mutual recursion are presented here. 
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class A { class B extends A { 

int i; A n() { return new A() ; } 

void m (B b) { b = new B(); } void m (B b) { ((A)b).i = 1; } 

} } 

Figures. Mutually recursive Java classes 

1. the Prelude theory defines a special type for objects and arrays of that 
class. This type can be the null reference or a reference pointing to a certain 
memory location where an object or a (multi-dimensional) array of objects 
of that class is stored. 

2. the Interface theory defines the types of fields, methods, and possibly the 
constructors of that class. There is also a reference to the direct superclass, 
and superinterfaces, if any. 

3. the Bodies and rewrites theory gives semantics to the method and construc- 
tor bodies. Also, auto-rewrite lemmas are generated, which can be used con- 
veniently during proofs (and hence reduce the proof interaction for a user). 



k’s Prelude B’s Prelude 




A ’s Bodies and rewrites B ’s Bodies and rewrites 

Figured. Generated theories and their importings for the classes A and B from 
Figure 3 



An Interface theory imports all Prelude theories of those classes which are 
used in its members’ signatures. Moreover, it also imports the Interface theory 
of its direct superclass, since the members of superclasses should be accessible. 
Importings of superclasses are transitive. A Bodies and rewrites theory imports 
all Interface theories from those classes of which their static type occurs in 
method and constructor bodies. Note that there are no circularities. 



3.2 Similarity Between Theories 

The kind (and number) of theories that are produced by the LOOP tool depends 
on the input language. Each language has its own specific properties, e,g. the 
theories for JML describe properties of implementations, whereas the theories 
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for Java describe concrete implementations. Thongh there are differences be- 
tween the kind of theories generated, the three inpnt langnages have theories in 
common. Actnally, this similarity forms the reason for having one tool for the 
three langnages. 

For a JML class, possibly defining specification fields and methods, an ex- 
tended Interface theory is generated, containing these extra fields and methods. 
And instead of a theory with semantics for bodies of methods and constrnctors, 
a theory with properties of implementations yielding from JML’s specification 
constrncts, snch as behavionr specifications and invariants, is generated. 

Also, when having both a Java implementation and a JML specification, 
another theory is generated to relate both of them, via a snitable translation of 
coalgebras, making the interface types match. This makes it possible to formnlate 
the intended proof obligation, namely that the Java implementation satisfies the 
JML specification. 



3.3 Abstract Theories 

The LOOP tool generates logical theories for PVS and Isabelle. Both these tools 
offer a higher order logic bnt nse a different syntax. The LOOP tool first generates 
theories as an abstract syntax tree, which abstracts away from these differences in 
syntax. This tree is built from types that cover common constructs used in higher 
order logic, such as function abstraction and application, and quantification. 

type expression = and formula = 

I Expression of formula I True 

I Application of (expression * expression) I Not of formula 

I Tuple of expression list I ... 

I ... 

Secondly, a theorem prover specific unparser, or pretty printer, is applied 
to the abstract theories in order to generate concrete theories. Writing such an 
unparser is fairly easy, as illustrated below, where it is done for PVS. 

let rec pp_pvs_expression = function 
I Expression form -> pp_formula form 
I Application (func, arg) -> 
pp_pvs_expression f unc ; 
print_string 
pp_pvs_expression arg; 
print_string ")" 

I ... 

and pp_pvs_f ormula = function 
I True -> print_string "TRUE" 

I Not form -> 

print_string "NOT ("; pp_f ormula form; print_string ")" 
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3.4 Size and Speed 

Translating the classes in the example in Snbsection 3.1 leads to 12 Kb of PVS 
theories and 14 Kb of Isabelle theories for class A, and respectively 17 Kb and 
21 Kb for class B. The main difference in size between the PVS and Isabelle 
theories is caused by the fact that in Isabelle each definition, when imported 
from another theory, has to be qualified with its theory name. A substantial part 
of these generated files consists of comments, explaining what is happening. 

In general, the size of the generated theories strongly depends on the number 
of superclasses. Every inherited method is repeated in the Interface theory, and 
its body’s semantics is recalculated^ and added to the Bodies and rewrites theory. 
Thus, inheritance leads to a substantial increase in the size of the generated 
theories. 

The LOOP tool can easily handle a large number of classes. Running the LOOP 
tool on the JDK 1.0.2 API (consisting of 215 classes, forming together over 1 
Mb of source-code), only takes five seconds, to parse and type check. To produce 
the series of logical theories takes about 50 seconds longer, mainly consisting of 
writing the concrete theories to file^. 

4 Use Scenarios 

For a successful run of the LOOP tool a Java class has to be type correct as defined 
by the JLS^. Type incorrectness of the input will lead to abrupt termination of 
the LOOP tool, via an error message. A successful run leads to a series of (PVS 
or Isabelle) type correct logical theories. 

The LOOP tool requires that every Java class that is used in an implementa- 
tion (and specification) occurs in the input series. This requirement is a design 
decision, since automatically loading of classes can lead to uncontrolled loading 
of too many classes. In practice it works best to cut away, for a verification, 
unnecessary details, i.e. class definitions and method definitions not used in the 
final program. In this way the user can restrict the size of the generated theories. 
It is of importance to keep this size as small as possible, to limit the time spent 
on loading and type checking by the theorem prover. 

Once translated, the desired properties of a Java class can be verified using a 
theorem prover. It is up to the user how to specify these properties: either JML 
specifications are used (which have the advantage of automatic translation), or 
hand-written specifications are formulated in the language semantics (in higher 
order logic). The verification of these properties goes via a combination of ap- 
plying (tailor-made) rewrite lemmas and definitions, and of applying Hoare logic 
rules [13,10]. 

^ This recalculation is necessary in order to reason about late binding in Java, which 
influences the behaviour of the method execution, see [9] for details. 

^ Experiments were done on a Pentium 111 500 MHz, running Linux. 

® A JML class has to be type correct following [15]. 
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The LOOP tool can also generate batch files. Snch a batch file contains the 
necessary steps for a theorem prover to take for type checking the generated 
theories, and for rernnning proofs. Hence, batch files are nsefnl, and rednce nser 
interaction. They are also nsed for rernnning old examples after new releases (of 
LOOP, PVS, or Isabelle), for compatibility checks. 



4.1 An Example 

The example below illnstrates the ease of nsing JML behavionr specifications. 
The constructor and methods all have a normal_behavior specification, which 
informally says that if precondition holds the method is only allowed to terminate 
normally in a state where the postcondition holds. The LOOP tool expresses the 
behaviour specifications in a specialised Hoare logic for JML. Reasoning about 
methods goes via applying suitable Hoare rules, as described in [13]. 

class A { class B { 

boolean bl , b2 ; 



normal_behavior 
@ requires : true; 

@ modifiable : bl , b2 ; 

@ ensures : b == (bl & b2) ; 

A (boolean b) { 
bl = b2 = b; 



normal_behavior } 

@ requires : true ; 

@ modifiable : bl ; 

@ ensures : bl != b2; 

void m() { bl = !b2 ; } 



normal_behavior 
@ requires : true; 

@ ensures : \result == false; 

boolean n() { 

A a = new A (true) ; 
a.mO ; 

return a.bl & a.b2; 

} 



In this example, it is easy to see that the constructor and the methods all 
terminate normally. Note that class B does not declare a constructor; a default 
constructor is created (see [6, § 8.2]) together with a default normalJoehavior 
specification (see [15, p. 48]). Thus, to prove these classes correct a user has to 
validate four proof obligations, of which three — of the constructors of A and B, 
and of method m — are straightforward and can be established with automatic 
rewriting. 

Proving correctness of a method containing method calls, like method n, 
can be established in two ways: (1) reasoning with their implementations, and 
(2) reasoning with their specifications. In general, the latter option is better, 
since it reduces the time spent on proving termination of the method calls and 
it enables modular verification. 

For method n in class B the proof obligation reads like 
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“Spec^ holds for an arbitrary implementation” 

“Spec^ holds for the given implementation”. 

The specification of class A is nsed to obtain the specifications of its members. 
These specifications are nsed to establish the postcondition of method n. Using 
the composition rnle from [13] also this proof is straightforward. 



4.2 Strategies 

Cnrrently, most of the reasoning abont JML-annotated Java programs is done 
in TVS. As experience is growing, more and more ingredients of the verification 
work are being incorporated in tailor-made proof strategies in TVS. These tnrn 
ont to be extremely nsefnl and snbstantially decrease the amonnt of interaction 
needed for proof constrnction. 



5 Application Areas 

The LOOP tool is applied in those areas, where the effort spent on specification 
and verification is jnstifiable. One can think of areas where economical and 
security aspects play an important role, such as the development of safety-critical 
systems, and integrated software development relying on formal methods. 

Java’s class library has many classes which are interesting for verification. 
Verifying classes from this class library can be useful, since many people use 
these classes to write their applications. The LOOP tool has been successfully 
applied to verify a non-trivial invariant property of the frequently used Vector 
class [11]. 

Also in the area of smart cards formal verification is becoming necessary, due 
to the higher standards the market demands. Smart cards are being issued in 
large numbers for security-sensitive applications, which justifies the application 
of formal methods: any error detected before issuing saves lots of money. The 
LOOP tool is used in the area of JavaCard based smart cards, especially to the 
JavaCard API (for specification and verification [4]), and to its applets — smart 
card programs — which are stored on the smart card. This work is part of a larger 
project, which is supported by the European Union^. 

6 Conclusions 

We have presented the modular architecture of the LOOP tool, which is used to 
reason about Java. The LOOP tool translates the implementation and specifi- 
cation of Java classes into their semantics in higher order logic. Internally, this 

See: http://www.verificard.org 
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semantics is abstractly generated as a series of theories, which can easily be con- 
cretised as theories for different theorem provers. The actual verification is done 
in the theorem prover. 

Doing full program verification for real-life programming languages is becom- 
ing feasible in more cases, but it still requires a major investment of time and 
resources. Such a verification technique can (only) be applied in areas where the 
presence of errors has a major impact on the money it costs to repair them. With 
a compiler like the LOOP tool, users can concentrate on the real work (specifica- 
tion and verification), without having to care about the actual modelling. 
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Abstract. The ability to analyze a digital system under conditions of 
uncertainty is important in several application domains. The problem is 
naturally described in terms of search in the powerset of the automaton 
representing the system. However, the associated exponential blow-up 
prevents the application of traditional model checking techniques. This 
work describes a new approach to searching powerset automata, which 
does not require the explicit powerset construction. We present an ef- 
ficient representation of the search space based on the combination of 
symbolic and explicit-state model checking techniques. We describe sev- 
eral search algorithms, based on two different, complementary search 
paradigms, and we experimentally evaluate the approach. 



Keywords: Explicit-State Model Checking, Symbolic Model Checking, Binary 
Decision Diagrams, Synchronization Sequences 

1 Introduction 

The ability of analyzing digital systems under conditions of uncertainty is ex- 
tremely useful in various application domains. For hardware circuits, it is im- 
portant to be able to determine homing, synchronization and distinguishing 
sequences, which allow to identify the status of a set of circuit flip-flops. For 
instance, synchronization sequences, i.e. sequences that will take a circuit from 
an unknown state into a completely defined one [9] , are used in test design and 
equivalence checking. Similar problems are encountered in automated test gener- 
ation, e.g. to determine what sequence of inputs can take the (black-box) system 
under test in a known state. In Artificial Intelligence, reasoning with uncertainty 
has been recognized as a significant problem since the early days. For instance, 
the Blind Robot problem [11] requires to plan the activity for a sensorless agent, 
positioned in any location of a given room, so that it will be guaranteed to 
achieve a given objective. 

Such problems are naturally formulated as search in the powerset of the space 
of the analyzed system [9]: a certain condition of uncertainty is represented as 
the set of indistinguishable system states. However, search in the powerset space 
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yields an exponential blow-up. A straightforward application of symbolic model 
checking techniques is hardly viable: the symbolic representation of the powerset 
automaton requires exponentially more variables than needed for the analyzed 
system. On the other hand, approaches based on explicit-state search methods 
tend to suffer from the enumerative nature of the algorithms. 

In this work, we propose a new approach to the problem of searching the pow- 
erset of a given nondeterministic automaton which does not require the explicit 
powerset automaton construction. The approach can be seen as expanding the 
relevant portion of the state space of the powerset automaton on demand, and 
allows to tackle in practice rather complex problems. The approach combines 
techniques from symbolic model checking, based on the use of Binary Decision 
Diagrams (bdds) [3], with techniques from explicit-state model checking. We 
represent in a fully symbolic way sets of sets of states, and we provide for the ef- 
ficient manipulation of such data structures. Using this representation, we tackle 
the problem of finding an input sequence which guarantees that only states in a 
target set will be reached for all runs, regardless of the uncertainty on the initial 
condition and on nondeterministic machine behaviors. We present several algo- 
rithms based on two different search paradigms. The fully- symbolic paradigm 
allows to perform a breadth-first search by representing the frontier as a single 
symbolic structure. In the semi-symbolic paradigm, search is performed in the 
style of explicit-state model checking, considering at each step only a (symbol- 
ically represented) element of the search space, i.e. a set of states. Both search 
paradigms are based on fully symbolic primitives for the expansion of the search 
space, thus overcoming the drawbacks of an enumerative approach. 

The algorithms return with failure if and only if the problem admits no so- 
lution, otherwise a solution is returned. Depending on the style of the search, 
the solution can be guaranteed to be of minimal length. We also present an 
experimental evaluation of our algorithms, showing that the paradigms are com- 
plementary and allow to tackle quite complex problems efficiently. 

The paper is structured as follows. In Section 2 we introduce the problem. In 
Section 3 we describe the techniques for the implicit representation of the search 
space, and in Section 4 we present the semi-symbolic and fully-symbolic search 
paradigms. In Section 5 we present an experimental evaluation of our approach. 
In Section 6 we discuss some related work and draw the conclusions. 

2 Intuitions and Background 

We consider nondeterministic finite state machines. S and A are the (finite) 
sets of states and inputs of the machine. IZ C S x A x S is the transition 
relation. We use s and s' to denote states of <S, and a to denote input values. 
In the following, we assume that a machine is given in the standard BDD-based 
representation used in symbolic model checking [10]. We call x and x' the vectors 
of current and next state variables, respectively, while a is the vector of input 
variables. We write O' = for the bdd in the a variables representing the input 
value a. When clear from the context, we confuse the set-theoretic and symbolic 
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Fig. 1. The example automaton 



representations. For instance, we use equivalently the False bdd and 0. We 
write lZ{x,a,x') for the bdd representing the transition relation to stress the 
dependency on bdd variables. We say that an input a is acceptable in s iff there 
is at least a state s' such that 7Z{s^ a, s') holds. The acceptability relation is 
represented symbolically by Acc(a:,Q') = 3x' .7Z{x,a,x'). An input sequence is 
an element of Al*. We use e for the 0-length input sequence, tt and p to denote 
input sequences, and tt; p for concatenation. 

In this paper we tackle the problem of finding an input sequence that, if 
applied to the machine from any initial state in T C 5, guarantees that the 
machine will reach a target set of states ^ C 5, regardless of nondeterminism. 
We use for explanatory purposes the simple system depicted in figure 1. A circuit 
is composed of two devices, x and y. The circuit is malfunctioning (c = 0), and 
the reason is that exactly one of the devices is faulty (i.e. x = 0 or p = 0). 
It is possible to fix either device (input Fixx and Fixy)^ but only if a certain 
precondition p is met. Fixing the faulty device has the effect of fixing the circuit 
(c = 1), while fixing the other one does not. Fixing either device has the uncertain 
effect of spoiling the fixing precondition condition (i.e. p = 0). Pfix has the effect 
of restoring the fixing precondition (p = 1). Each state is given a number, and 
contains all the propositions holding in that state. For instance, state 1 represents 
the state where device x is the reason for the fault, and fixing is possible. Given 
that only one device is faulty, x = 0 also stands for p = 1, and vice versa. 

The problem is finding an input sequence which fixes the circuit, taking the 
machine from any state inX = {1,2, 3, 4} (where the circuit is faulty, but we 
don’t know if the reason is in device x or p, nor if fixing is possible) to the 
target set Q = {5,7} (the circuit is fixed, and the fixing condition is restored). 
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Fig. 2. A solution to for example problem 



A possible solution is the input sequence: Pfix; FiXx ; Pfix ; Fixy] Pfix. 
Figure 2 shows why this is the case. The initial uncertainty is in that the system 
might be in any of the states in {1,2, 3, 4}. This set is represented in figure 2 
by a dashed line. We call such a set an uncertainty state as in [2]. Intuitively, 
an uncertainty state expresses a condition of uncertainty about the system, by 
collecting together all the states which are indistinguishable while analyzing the 
system. An uncertainty state is an element of Pow(5), i.e. the powerset of the set 
of states of the machine. The first input value, Pfix, makes sure that fixing is 
possible. This reduces the uncertainty to the uncertainty state {1,3}. Despite the 
remaining uncertainty (i.e. it is still not known which component is responsible 
for the circuit fault), the following input value FiXx is now guaranteed to be 
acceptable because it is acceptable in both states 1 and 3. FiXx has the effect 
of removing the fault if it depends on device x, and can nondeterministically 
remove the precondition for further fixing (p = 0). The resulting uncertainty 
state is {3, 4, 5, 6}. The following input, Pfix, restores p = 1, reducing the 
uncertainty to the uncertainty state {3,5}, and guarantees the acceptability of 
Fixy. After Fixy, the circuit is guaranteed to be fixed, but p might be 0 again 
(states 6 and 8 in the uncertainty state {5,6,7, 8}). The final Pfix reduces 
the uncertainty to the uncertainty state {5, 7}, and guarantees that only target 
states are reached. 

The following definition captures the intuitive arguments given above. 

Definition 1. An input a is acceptable in an uncertainty state i/) ^ Us C S iff 
a is acceptable in every state in Us, i.e. 3a.'^x.{{Us{x) Aa = a) ^ Acc{x,a)) 
is not 0 . 

If a is acceptable in Us, its image Image[a]{U s) is the set of all the states 
reachable from Us under a, i.e. 3x.{U s{x) A3a.{a = aAlZ{x,a,x')))[x' /x] where 
[x' /x] represents parallel variable substitution. 
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The image of an input sequenee tt in an uneertainty state, written 
Image[K](U s) , is defined as follows. 



Image[e]{U s) 
Image[7r](0) 
Image[a; 7t]{U s) 
Image[a; 7t]{U s) 



Us 

0 

0, if a is not aeeeptable in U s 
Image[K]{Image[a]{U s)), otherwise 



The input sequenee t: is a solution to the powerset reaehahility problem from 
0 C S to 0 Q C S iff 0 Image[7r]{T) C Q. 

Search in Pow(^S) can be performed either forwards (from the initial uncer- 
tainty state T towards the target uncertainty state G) or backwards (from G 
towards X). Figure 2 depicts a subset of the search space when proceeding for- 
wards. The full picture can be obtained by considering the effect of the other 
input values to the uncertainty states. For instance, the input values P fix on 
the second uncertainty state {1,3} would result in a self loop, while Fixy would 
lead to {1,3,7, 8}. The first and third uncertainty states can not be expanded 
further, because the input values Fix^ and FiXy are not acceptable. When a 
nonempty uncertainty state C ^ is built from X, the associated input se- 
quence (labeling a path from X to U Si) is a solution to the problem. 

Figure 3 depicts the backward search space. The levels are built from the 
target states, on the right, towards the initial ones, on the left. At level 0 we have 
the pair ({5,7} . e), composed of an uncertainty state and an input sequence. 
We call such a pair uncertainty state-input sequence (UsS) pair. The dashed 
arrows represent the strong preimage of each Usi under the input value ai, 
i.e. the extraction of the maximal uncertainty state where the ai is acceptable, 
and guaranteed to result into the uncertainty state being expanded. At level 1, 
only the UsS pair ({5,6,7, 8} . Pfix) is built, since the strong preimage of the 
uncertainty state 0 for the inputs FiXx and FiXy is empty. At level 2, there 
are three UsS pairs, with (overlapping) uncertainty states Us2, Us3 and Us4, 
associated, respectively, with the length 2 sequences FiXx] P fix, Pfix; Pfix 
and FiXy; Pfix. (While proceeding backwards, a sequence is associated with an 
uncertainty state Usi if it labels a path from Usi to the target set.) Notice that 
Us3 is equal to Usi, and therefore deserves no further expansion. The expansion 
of uncertainty states 2 and 4 gives the uncertainty states 5 and 6, both obtained 
by the strong preimage under Pfix, while the strong preimage under inputs 
FiXx and FiXy returns empty uncertainty states. The further expansion of Us5 
results in three uncertainty states. The one resulting from the strong preimage 
under Pfix is not reported, as equal to Us5. Uncertainty state 7 is also equal to 
Us2, and deserves no further expansion. Uncertainty state 8 can be obtained by 
expanding both Us5 and Us6. At level 5, the expansion produces UslO, which 
contains all the initial states. Therefore, both the corresponding sequences are 
solutions to the problem. 
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Fig. 3. The Search Space for the Example Problem 

3 Efficient Representation of Pow(<S) 

In this section we describe the symbolic representation of the search space 
Pow(<S), and the primitives used in the search. Our representation mechanisms 
combines elements used in symbolic and explicit-state model checking. The first 
ingredient is a standard bdd package, providing for the symbolic representa- 
tion mechanism. Each uncertainty state Us is directly represented by the bdd 
Us(x)^ whose models are exactly the states contained in Us. In practice, the 
uncertainty state is the pointer to the corresponding bdd. The second ingredi- 
ent, from explicit-state model checking, is a hash table, which is used to store 
and retrieve pointers to the (bdds representing the) uncertainty states which 
have been visited during the search. The approach heavily relies on the nor- 
mal form of BDD, which allow for comparison in constant time. Eigure 4 gives 
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Fig. 4. The combined use of bdd and the cache 



overview of the approach on the data structure built while analyzing the ex- 
ample. The column on the left shows the variables in the bdd package. Let us 
focus first on the lower part that contains the state variables, i.e. x, c and 
p. (The upper variables, including the input variables, will be clarified later in 
this section.) Each uncertainty state in figure 3 is represented by a (suitably 
labeled) bdd, shown in the picture as a subgraph. (Solid [dashed, respectively] 
arcs in a bdd represent the positive [negative, resp.] assignment to the variable 
in the originating node. For the sake of clarity, only the paths leading to True 
are shown.) On the right hand side, two configurations of the visited uncertainty 
states hash are shown. The picture gives an example of the potential memory 
savings which can be obtained thanks to the great ability of the BDD package 
to minimize BDD memory occupation. Besides the uniqueness, there is a large 
amount of sharing among different BDDs: for instance, Us6 and UslO share their 
sub-nodes with the previously constructed Us2, Us3 and Us4. Furthermore, the 
set-theoretic operations for the transformation and combination of uncertainty 
states (e.g. projection, equivalence, inclusion) can be efficiently performed with 
the primitives provided by the bdd package. The advantage over an enumerative 
representation of uncertainty states (e.g. the list of the state vectors associated 
to each state contained in the Us) is evident. 

The exploration of Fow{S) is based on the use of UsS tables, i.e. sets of UsS 
pairs, of the form UsST = {{U si . tti), ... , {U Sn . tt^)} where the tt^ are input 
sequences of the same length, such that tt^ ^ ttj for all I < < n. We call U Si 

the uncertainty set indexed by tt^. When no ambiguity arises, we write UsST{ni) 
for U Si. A UsS table allows to represent a level in the search space. For instance. 
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when proceeding backward (see figure 3), each UsS pair {U Si . tt^) in the UsS 
table is such that U Si is the maximal uncertainty state in which the associated 
input sequence is acceptable, and its image is contained in the target states. 
When proceeding forward, for each UsS pair, the uncertainty state is the result 
of the application of the corresponding input sequence to the initial set. 

The key to the efficient search is the symbolic representation of UsS tables, 
which allows for compactly storing sets of sets of states (annotated by input 
sequences) and their transformations. A UsS table { ({sj, . . . ,5^^} . tti), . . . , 
({ 5 ^, . . . , 5 ^^} . 7T/e) } is represented as a relation between input sequences of the 
same length and states, by associating directly to each state in the uncertainty 
state the indexing input sequence, i.e. { {s\ . tti), ... ,( 5 ^^ . tti ),..., ( 5 ^ . 7T/e), 
. . . • ^k) }• Given this view, the expansion can be obtained symbolically 

as follows. Let us consider first the UsS table {{Us . e)} represented by the bdd 
Us{x). The backward step of expansion BwdExpandUsSTable constructs the 
UsS table containing the strong preimage of Us under each of the input values. 
This is the set of all state-input pairs where the input is acceptable in the state 
and all the successor states are in Us. Symbolically, we compute 

\/x' .{lZ{x,a,x') Us{x)[x/x']) A Acc(x,a) 

i.e. a BDD in the x and a variables. This represents a relation between states 
and length-one input sequences, i.e. a UsS table where each Usi is annotated by 
a length-one input sequence 

The dual forward step FwdExpandUsSTable expands {(Us . e)} by com- 
puting the images of Us under every acceptable input: 

{3x.{Us{x) A (Vx.(Us(x) Acc(x,Q')) A 7l{x,a,x'))))[x/x'] 

The resulting bdd represents a UsS table, where each Usi is annotated by a 
length-one input sequence ai such that 0 7 ^ Image[ai]{Us) = Usi. 

In the general case, a UsS tables can contain longer input sequences, and the 
vector a of input variables is not enough. Therefore, we use additional variables 
to represent the values of the input sequence at the different steps. To repre- 
sent input sequences of length i, we need i vectors of new BDD variables, called 
sequence variables. The vector of sequence variables representing the Uth value 
of the sequence is written 7T[i], with |7T[i]| = \a\. Figure 4 shows the UsS table 
representing the third level of backward search space depicted in figure 3. The 
upper variables in the order are the input variables iO and il and the sequence 
variables. When searching forwards [backwards, respectively] 7T[i] is used to en- 
code the z-th [z-th to last, resp.] value in the sequence. The backward expansion 
primitive BwdExpandUsSTable can be applied in the general case to a UsS 
table U55'T^_i(a:,7r[i_i], . . . ,7T[i]), associating an uncertainty state to plans of 
length z — 1: 

{Wx' .{lZ{x,a,x') UsSTi-i{x,7T[i_ij , . . . ,7T[i])[a;/a;']) A Acc{x,a))[a/n[ij] 

As in the length-one case, the next state variables x' in IZ and in UsSTi-i (re- 
sulting from the substitution) disappear because of the universal quantification. 
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The input variables a are renamed to the newly introduced plan variables 7T[i], 
so that in the next step of the algorithm the construction can be repeated. The 
forward step is defined dually. Notice that the fully symbolic expansion of UsS 
tables avoids the explicit enumeration of input values. This can lead to signif- 
icant advantages when only a few distinct uncertainty states result from the 
application of all possible input values. 

For either search directions, every time a UsS table is built its uncertainty 
states have to be compared with the previously visited uncertainty states. If 
not present, they must be inserted in the hash of the visited uncertainty states, 
otherwise eliminated. This analysis is performed by a special purpose primitive, 
called PruneUsSTable, which operates directly on the bdd representing the 
UsS table. The primitive assumes that in the bdd package input and sequence 
variables precede state variables (see figure 4). PruneUsSTable recursively 
descends the UsS table, and interprets as an uncertainty state every bdd node 
having a state variable at its top. It accesses the hash table of the previously 
visited uncertainty states with the newly found Us: if it is not present, then it is 
stored and returned, otherwise False BDD is returned, and the traversal contin- 
ues on different branches of the input and sequence variables. In this way, a new 
UsS table is built, where only the Us which had not been previously encountered 
are left. The pruning step also takes care of another source of redundancy: UsS 
tables often contain a large number of equivalent input sequences, all indexing 
exactly the same uncertainty state (in figure 3, two equivalent input sequences 
are associated with UsS). The resulting UsS table is such that, for each Us, 
only one (partial) assignments to the input and sequence variables is left. This 
simplification can sometime lead to dramatic savings. 

4 Algorithms for Searching Pow(<S) 

In this section we present two examples of search algorithms based on the data 
structures and primitives described in previous section. Both algorithms take 
in input the problem description in form of the BDDs F{x) and ^(x), while the 
transition relation IZ is assumed to be globally available. 

Figure 5 presents the semi-symbolic forward search algorithm. The algorithm 
represents the input sequences associated with the (symbolically represented) un- 
certainty states visited during the search as (explicit) lists of input values. The al- 
gorithm is based on the expansion of individual uncertainty states. OpenUsPool 
contains the (annotated) uncertainty states which have been reached but still 
have to be explored, and is initialized to the first uncertainty state of the search, 
i.e. T, annotated with the empty input sequence e. UsM ARK VISITED inserts 1 
into the hash table of visited uncertainty states. The algorithm loops (lines 3-11) 
until a solution is found or all the search space has been exhausted. First, an 
annotated uncertainty state {Us . tt) is extracted from the open pool (line 4) by 
ExtractBest. The uncertainty state is expanded by FwdExpandUsSTable, 
computing the corresponding UsS table (with length-one sequences). The result- 
ing UsS table is traversed as explained in previous section, accessing with each 
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procedure SemiSymFwdSearch(X ,Q ) 

0 begin 

1 OpenUsPool := {{T . e)}; UsMarkVisited (Tj; 

2 Solved := 1 C Q; Solution := e; 

3 while (OpenUsPool 7^ 0 A -^Solved) do 

4 {Us . 7t) ExtractBest (O penUsPool); 

5 UsSTable FwdExpandUsSTable(^[/5^; 

6 UsSList := FreneListV sSTable{U s ST able) ; 

7 for {Usi . ai) in UsSList do 

8 if Usi C Q then 

9 Solved := True; Solution 7r;ai ; break; 

10 else Insert ({U si . tv; ai) , OpenUsPool) endif; 

11 end while 

12 if Solved then return Solution; 

13 else return Fail; 

14 end 



Fig. 5. The semi-symbolic, forward search algorithm. 



uncertainty state the hash table of the already visited uncertainty states, dis- 
carding all the occurrences of present uncertainty states, and marking the new 
ones. The primitive PruneListUsSTable is a version of PruneUsSTable 
that returns the explicit list of the UsS pairs in the pruned UsS table. Each of 
the resulting uncertainty states is compared with the set of target states Q. If 
Usi C then the associated input sequence tt; ai is a solution to the problem, 
the loop is exited and the sequence is returned. Otherwise, the annotated uncer- 
tainty state {U Si . ai] tt) is inserted in OpenUsPool and the loop is resumed. If the 
OpenUsPool becomes empty and a solution has not been found, then a fix point 
has been reached, i.e. all the reachable space of uncertainty states has been cov- 
ered, and the algorithm terminates with failure. Depending on ExtractBest 
and Insert, different search strategies (e.g. depth-hrst, breadth- first, best-first) 
can be implemented. 

Figure 6 shows the fully -symbolic, backward search algorithm. The algorithm 
relies on sequence variables for a symbolic representation of the input sequences, 
and recursively expands the UsS tables, thus implementing a breadth- first sym- 
bolic search. The algorithm proceeds from Q towards T, exploring a search space 
built as in figure 3. The array UsS Tables is used to store the UsS tables represent- 
ing the levels of the search associated with input sequences of increasing length. 
The algorithm first checks (line 4) if e is a solution. If not, the while loop is en- 
tered. At each iteration, input sequences of increasing length are explored (lines 
5 to 8). The step at line 6 expands the UsS table in UsSTables[i — 1] and stores 
the resulting UsS table in UsSTables[i]. UsS pairs which are redundant with 
respect to the current search are eliminated from UsSTables[i] (line 7). The pos- 
sible solutions contained in UsSTables[i] are extracted and stored in Solutions 
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procedure Fully SymBw dS earch (X ,Q ) 

0 begin 

1 i = 0; UsMark Visited 

2 UsSTablesfO] { {6 . e) }; 

3 Solutions BwdExtractSolution( t/5S'Ta6/es[0]); 

4 while ((UsSTables[i] ^ 0) A (Solutions = 0)) do 

5 i i -h Ij 

6 UsSTablesfi] BwdExpandIJ sSTable (U sS Tables [i-1]); 

7 UsSTablesfi] := PruneUsS Table ( t/5S'Ta6/es[i]); 

8 Solutions BwdExtractSolution( t/5S'Ta&/es[i]); 

9 done 

10 if (UsSTables[i] — tf) then 

11 return Fail; 

12 else return Solutions; 

13 end 



Fig. 6. The fully- symbolic, backward search algorithm. 



(line 8). The loop terminates if either a solution is found (Solutions 7^ 0), or the 
space of input sequences has been completely explored ( UsSTahles[i] = 0). 

BwdExtractSolution checks if a UsS table contains a uncertainty state 
Usi such that X C Usi. It takes in input the bdd representation of a UsS 
table UsSTi(x^7T^ij, . . . ,7 T[i]), and extracts the assignments to sequence vari- 
ables such that the corresponding set contains the initial states, by computing 
\/x.(X(x) UsS'Ti(x,7r[i], ... ,7T[i])). The result is a bdd in the sequence vari- 
ables 7T[i], . . . ,7T[i]. If the BDD is Falsc ^ then there are no solutions of length i. 
Otherwise, each of the satisfying assignments of the resulting bdd represents a 
solution sequence. 

The algorithms described here are only two witnesses of a family of possible 
algorithms. For instance, it is possible to proceed forwards in the fully- symbolic 
search, and to proceed backwards in the semi-symbolic search. 

The algorithms enjoy the following properties. First, they always terminates. 
This follows from the fact that the set of explored uncertainty sets (stored in 
the visited hash table) is monotonically increasing: at each step we proceed only 
if at least one new uncertainty state is generated. The newly constructed UsS 
table are simplified by removing the uncertainty states which do not deserve 
further expansion. Since the set of accumulated uncertainty states is contained 
in Pow(<S), which is finite, a fix point is eventually reached. Second, a failure 
is returned if and only if there is no a solution to the given problem, otherwise 
a solution sequence is returned. This property follows from the facts that in 
the semi-symbolic search uncertainty states sequences constructed are such that 
(Us . 7 t) enjoy the property Image[K](X) = Us. Thus, tt is a solution to the 
problem (X . Us). In the fully symbolic search uncertainty states sequences 
constructed are such that 0 7^ Image[7r](U s) C Q. Thus, tt is a solution to the 
problem (Us . Q). The fully symbolic algorithm is also optimal, i.e. it returns 
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plans of minimal length. This property follows from the breadth-first style of the 
search. 

5 Experimental Evaluation 

The data structures and the algorithms (semi- and fully-symbolic forward and 
backward search) have been implemented on top of the symbolic model checker 
NuSMV [5]. An open hashing mechanism is used to store visited uncertainty 
states. We present a preliminary experimental evaluation of our approach. We 
report two sets of experiments. The first ones are from artificial intelligence 
planning. (The results are labeled with AI in table 1.) FIXi is the generaliza- 
tion of the example system of figure 1 to i devices. For lack of space, we refer 
to [6] for the description of the other problems. In most cases, the automaton 
is fully nondeterministic, and there are acceptability conditions for the inputs. 
The problems are specified by providing the initial and target sets. 

The second class of tests is based on the ISCAS89 [7] and MCNC [16] circuit 
benchmarks, the problem being finding a synchronization sequence, i.e. reaching 
a condition of certainty (i.e. a single final state) from a completely unspecified 
initial condition. We ran the same test cases as reported in [12,13]. In order to 
tackle these problems, we extended the forward^ search algorithms (both semi- 
and fully-symbolic) with an ad-hoc routine for checking if a given uncertainty 
state is a solution (i.e. if it contains exactly one state). 

To the best of our knowledge, no formal verification system able to solve 
these kind of problems is available, therefore we could not perform a direct ex- 
perimental comparison. In [6], a detailed comparative evaluation shows that FSB 
outperforms all the other approaches to conformant planning (based on a deci- 
sion procedure for Qbf [14], on heuristic search [1], and on planning graphs [15]). 
The results of our approach to searching synchronization sequences appears to 
be at least as good as the ones in [12,13]. Normalizing the results with respect 
to the platform,^ especially for the problems with longer reset sequences (e.g. 
planet, sand) we obtain a significant speed up and we are able to return shorter 
solutions. Furthermore, our approach tackles a more complex problem. Indeed, 
the approach in [12,13] is tailored to synchronization problems, and the system 
is assumed to be deterministic, i.e. uncertainty, intended as the number of in- 
distinguishable states, is guaranteed to be non-increasing. We deal with fully 
nondeterministic systems, where uncertainty can also grow. Finally, our results 
were obtained using a monolithic transition relation (although nothing prevents 
from the use of partitioning techniques). 

To summarize, the experimental results seem to confirm the following intu- 
itions. The semi-symbolic approach is often much faster than the fully-symbolic 

^ In order to proceed backwards when searching for a synchronization sequence, the 
starting point must be the set of all singletons of size |<S| . Although possible in theory, 
the approach seems to be unfeasible in practice. 

^ From the limited information available, we estimate that the results in [12,13] were 
obtained on a machine at most 15 times slower than ours. 
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AI 


1 SSF 1 


1 FSB 


Name 


# ff 


# I 


L 


Time 


L 


Time 


fix 2 


3 


2 


5 


0.001 


5 


0.001 


fixlO 


6 


4 


21 


0.001 


21 


0.440 


fixl6 


6 


5 


33 


0.010 


33 


56.190 


bmtcl021 


7 


5 


18 


0.020 


18 


1.220 


bmtcl02m 


7 


5 


19 


0.020 


19 


1.190 


bmtcl02h 


7 


5 


19 


0.020 


19 


1.190 


bmtclObl 


11 


7 


18 


0.100 


14 


62.590 


bmtclObm 


11 


7 


18 


0.100 


17 


64.970 


bmtclObh 


11 


7 


18 


0.100 


17 


64.970 


ring2 


5 


2 


6 


0.001 


5 


0.001 


ringlO 


34 


2 


76 


0.100 


29 


60.940 


uring2 


5 


2 


5 


0.001 


5 


0.001 


uringlO 


34 


2 


29 


0.050 


29 


1.260 


cubec 


12 


3 


64 


0.040 


60 


39.210 


cubes 


12 


3 


58 


0.030 


54 


16.140 


cubee 


12 


3 


42 


0.020 


42 


1.350 


omel50 


15 


3 


X 


4.400 


X 


1.380 


omellOO 


17 


3 


X 


67.190 


X 


8.130 



MCNC’91 


ISSFs^^el 


IFSFs^^e 


Name 


#FF 


#I 


L 


Time 


L 


Time 


bbara 


4 


4 




0.000 




0.010 


bbsse 


4 


7 


2 


0.030 


2 


0.010 


bbtas 


3 


2 


3 


0.000 


3 


0.000 


beecount 


3 


3 


1 


0.000 


1 


0.000 


cse 


4 


7 


1 


0.000 


1 


0.010 


dkl4 


3 


3 


2 


0.000 


2 


0.000 


dkl5 


2 


3 


3 


0.000 


1 


0.000 


dkl6 


5 


2 


4 


0.000 


4 


0.010 


dkl7 


3 


2 


3 


0.000 


3 


0.010 


dk27 


3 


1 


4 


0.000 


4 


0.000 


dk512 


4 


1 


5 


0.000 


4 


0.000 


donfile 


5 


2 


3 


0.000 


3 


0.000 


exl 


5 


9 


3 


0.000 


3 


0.160 


ex2 


5 


2 


X 


0.000 


X 


0.000 


ex3 


4 


2 


X 


0.000 


X 


0.000 


ex4 


4 


6 


13 


0.010 


10 


1.150 


ex 5 


4 


2 


X 


0.000 


X 


0.000 


ex6 


3 


5 


1 


0.000 


1 


0.000 


ex 7 


4 


2 


X 


0.000 


X 


0.000 


keyb 


5 


7 


2 


0.010 


2 


0.010 


lion9 


4 


2 


X 


0.000 


X 


0.000 


markl 


4 


5 


1 


0.000 


1 


0.000 


opus 


4 


5 


1 


0.000 


1 


0.000 


planet 


6 


7 


20 


0.110 




M.O. 


si 


5 


8 


3 


0.020 


“3 


0.800 


sla 


5 


8 


3 


0.020 


3 


0.810 


s8 


3 


4 


4 


0.000 


4 


0.010 


sand 


5 


11 


19 


0.190 




T.O. 


tav 


2 


4 


X 


0.020 


IT 


0.000 


tbk 


5 


6 


1 


0.080 


1 


0.000 


trainll 


4 


2 


X 


0.000 


X 


0.000 



ISCAS’89 


1 SSFs^nc 1 


iFSSs^^e 


Name 


#FF 


#I 


L 


Time 


L 


Time 


sll96 


18 


14 




3.370 




0.280 


sl238 


18 


14 


1 


3.320 


1 


0.300 


sl488 


6 


8 


1 


0.010 


1 


0.000 


sl494 


6 


8 


1 


0.010 


1 


0.010 


S208.1 


8 


10 


X 


0.000 


X 


0.000 


s27 


3 


4 


1 


0.010 


1 


0.000 


s298 


14 


3 


2 


0.010 


2 


0.040 


s344 


15 


9 


2 


0.300 


2 


6.090 


s349 


15 


9 


2 


0.300 


2 


6.100 


s382 


21 


3 


1 


0.010 


1 


0.010 


s386 


6 


7 


2 


0.040 


2 


0.020 


s400 


21 


3 


1 


0.010 


1 


0.000 


S420.1 


16 


18 


X 


0.120 


X 


0.000 


s444 


21 


3 


1 


0.030 


1 


0.020 


s510 


6 


19 




T.O. 




T.O. 


s526 


21 


3 




0.090 




0.120 


s641 


19 


35 


1 


1.550 


1 


0.150 


s713 


19 


35 


1 


0.540 


1 


0.150 


s820 


5 


18 


1 


0.150 


1 


0.050 


s832 


5 


18 




0.140 




0.040 


S838.1 


32 


34 


X 


0.430 


X 


0.000 



The experiments were executed on an Intel 300MHz Pentium-II, 512Mb RAM, 
running Linux. #FF and are the number of boolean state variables and 
inputs in the system automaton. SSF and FSB are the semi-symbolic forward 
and the fully-symbolic backward algorithms. SSF^^nc and FSF^ync are the 
semi-symbolic and fully-symbolic forward search algorithms extended with the 
ad-hoc termination test for synchronization sequences. Times are reported in 
seconds. T.O. means time out after 1 hour CPU. M.O. means memory limit 
of 500Mb exhausted. L is the length of the solution found. X means that the 
problem admits no solution. 



Table 1. Experimental results 
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one, when a solution exists. This appears to be caused by the additional sequence 
variables, and by the breadth- first style of the search. However, the fully- symbolic 
approach appears to be superior in discovering that the problem admits no so- 
lution, and returns sequences of minimal length. Forward search (either fully- or 
semi-symbolic) is usually inferior to backward (with some notable exceptions). 

6 Related Work and Conclusions 

In this paper we have presented a new approach to the problem of searching 
powerset automata which tackles the exponential blowup directly related to the 
powerset construction. Our approach combines techniques from symbolic and 
explicit-state model checking, and allows for different, complementary search 
strategies. The work presented in this paper is based on the work in [6], devel- 
oped in the field of Artificial Intelligence planning, where fully symbolic search 
is described. In this paper we extend [6] with semi-symbolic search techniques, 
and we provide a comparative evaluation of the approaches on a larger set of test 
cases, including synchronization sequences from the IS CAS and MCNC bench- 
mark circuits. Besides [12,13], discussed in previous section, few other works 
appear to be related to ours. In SPIN [8], the idea of combining a symbolic rep- 
resentation with explicit-state model checking is also present: an automaton-like 
structure is used to compactly represent the set of visited states. In [4], an ex- 
ternal hash table is combined with a bdd package in order to extract additional 
information for guided search. In both cases, however, the integration of such 
techniques is directed to standard model checking problems. 

The work presented in this paper will be extended as follows. An extensive 
experimental evaluation, together with a tighter integration of optimized model 
checking techniques, is currently being carried on. Then, different search methods 
(e.g. combining forward and backward search, partitioning of UsS tables) will 
be investigated. Furthermore, the approach, currently presented for reachability 
problems, will be generalized to deal with LTL specifications. Finally, the case of 
partial observability (i.e. when a limited amount of information can be acquired 
at run time) will be tackled. 
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Abstract. We present a novel algorithm for generating state spaces of 
asynchronous systems using Multi-valued Decision Diagrams. In contrast 
to related work, we encode the next-state function of a system not as a 
single Boolean function, but as cross-products of integer functions. This 
permits the application of various iteration strategies to build a system’s 
state space. In particular, we introduce a new elegant strategy, called 
saturation, and implement it in the tool SMART. On top of usually 
performing several orders of magnitude faster than existing BDD-based 
state-space generators, our algorithm’s required peak memory is often 
close to the hnal memory needed for storing the overall state space. 



1 Introduction 

State-space generation is one of the most fundamental challenges for many for- 
mal verification tools, such as model checkers [13] . The high complexity of today A 
digital systems requires constructing and storing huge state spaces in the rel- 
atively small memory of a workstation. One research direction widely pursued 
in the literature suggests the use of decision diagrams, usually Binary Decision 
Diagrams [7] (BDDs), as a data structure for implicitly representing large sets 
of states in a compact fashion. This proved to be very successful for the veri- 
fication of synchronous digital circuits, as it increased the manageable sizes of 
state spaces from about 10^ states, with traditional explicit state-space genera- 
tion techniques [14], to about 10^^ states [9]. Unfortunately, symbolic techniques 
are known not to work well for asynchronous systems, such as communication 
protocols, which particularly suffer from state-space explosion. 

The latter problem was addressed in previous work by the authors in the 
context of state-space generation using Multi-valued Decision Diagrams [18] 
(MDDs), which exploited the fact that, in event-based asynchronous systems, 
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tration under NASA Contract No. NAS 1-97046 while the authors were in residence 
at the Institute for Computer Applications in Science and Engineering (ICASE), 
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T. Margaria and W. Yi (Eds.): TACAS 2001, LNCS 2031, pp. 328-342, 2001. 
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each event updates just a few components of a system A state vector [10]. Hence, 
firing an event only requires the application of local next-state functions and the 
local manipulation of MDDs. This is in contrast to classic BDD-based techniques 
which construct state spaces by iteratively applying a single, global next-state 
function which is itself encoded as a BDD [20]. Additionally, in most concur- 
rency frameworks including Petri nets [23] and process algebras [5], next-state 
functions satisfy a product form allowing each component of the state vector to 
be updated somewhat independently of the others. Experimental results imple- 
menting these ideas of locality showed significant improvements in speed and 
memory consumption when compared to other state-space generators [22]. 

In this paper, we take our previous approach a significant step further by ob- 
serving that the reachable state space of a system can be built by firing the sys- 
tem A events in any order, as long as every event is considered often enough [16]. 
We exploit this freedom by proposing a novel strategy which exhaustively fires 
all events affecting a given MDD node, thereby bringing it to its final saturated 
shape. Moreover, nodes are considered in a depth-first fashion, i.e., when a node 
is processed, all its descendants are already saturated. The resulting state-space 
generation algorithm is not only concise, but also allows for an elegant proof 
of correctness. Compared to our previous work [10], saturation eliminates a fair 
amount of administration overhead, reduces the average number of firing events, 
and enables a simpler and more efficient cache management. 

We implemented the new algorithm in the tool SMART [11], and experimen- 
tal studies indicate that it performs on average about one order of magnitude 
faster than our old algorithm. Even more important and in contrast to related 
work, the peak memory requirements of our algorithm are often close to its final 
memory requirements. In the case of the dining philosophers’ problem, we are 
able to construct the state space of about 10^^^ states, for 1000 philosophers, in 
under one second on a 800 MHz Pentium HI PC using only 390KB of memory. 

2 MDDs for Encoding Structured State Spaces 

State spaces and next— state functions. A discrete-state model expressed in 
a high-level formalism must specify: (i) S, the set of potential states describing 
the “type” of states; (ii) s E S, the initial state] and (iii) J\f : S — 2*^, the 
next-state function, describing which states can be reached from a given state 
in a single step. In many cases, such as Petri nets and process algebras, a model 
expresses this function as a union J\f = Afe, where ^ is a finite set of events 

and Me is the next-state function associated with event e. We say that A^e(^) is 
the set of states the system can enter when event e occurs, or fires, in state s. 
Moreover, event e is called disabled in s if A^e(^) = 0; otherwise, it is enabled. 

The reachable state space S C S of the model under consideration is the 
smallest set containing the initial system state s and closed with respect to M, 
i.e., S = {s} U A^(s) U W(A^(s)) U • • • = A^*(s), where denotes the reflexive 
and transitive closure. When M is composed of several functions Me, for e E S, 
we can iterate these functions in any order, as long as we consider each Me often 
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enough. In other words, i ^ S if and only if it can be reached from s through 
zero or more event firings. In this paper we assume that S is finite; however, 
for most practical asynchronous systems, the size of S is enormous due to the 
state-space explosion problem. 

Multi-valued decision diagrams. One way to cope with this problem is to 
use efficient data structures to encode S that exploit the system’s structure. We 
consider a common case in asynchronous system design, where a system model 
is composed of K submodels, for some K G H, so that a global system state 
is a A"-tuple (i^, ... ,i^), where is the local state for submodel k. (We use 
superscripts for submodel indices — not for exponentiation — and subscripts for 
event indices.) Thus, S = x • • • x 5^, with each local state space having 
some finite size . In Petri nets, for example, the set of places can be partitioned 
into K subsets, and the marking can be written as the composition of the K 
corresponding submarkings. When identifying with the initial integer interval 
{0,... ,n^ — 1}, for each K > k > 1, one can encode S C S via a {quasi-reduced 
ordered) MDD, i.e., a directed acyclic edge-labelled multi-graph where: 

— Nodes are organized into K -h 1 levels. We write (k.p) to denote a generic 
node, where k is the level and p is a unique index for that level. Level K 
contains only a single non-terminal node (K.r), the root, whereas levels K—1 
through 1 contain one or more non-terminal nodes. Level 0 consists of two 
terminal nodes, (0.0) and (0.1). (We use boldface for the node indices 0 or 1 
since these have a special meaning, as we will explain later.) 

— A non-terminal node (k.p) has n^ arcs pointing to nodes at level k — 1. If 
the arc, for i G , is to node (k — l.q), we write {k.p)[i] = q. Unlike 
in the original BDD setting [7, 8], we allow for redundant nodes, having all 
arcs pointing to the same node. This will be convenient for our purposes, as 
eliminating such nodes would lead to arcs spanning multiple levels. 

— A non-terminal node cannot duplicate (i.e., have the same pattern of arcs 
as) another node at the same level. 

Given a node (k.p), we can recursively define the node reached from it through 
any integer sequence 7 =df (i^ , • • • , i^) G x x • • • x of length 

k — I 1, for A" >/?>/> 1 , as 

j y y ( (k-p) if 7 = (), the empty sequence 

node{{k.p),j) - I if ^ ^ {k.p)[r] = q. 

The substates encoded by p or reaching p are then, respectively, 

B{{k.p)) = {/? G X • • • X 5^ : node{{k.p), /?) = (0.1)} ^^below^^ {k.p ) ; 

A{{k.p)) = {a G X • • • X 5^+^ : node{{K.r), a) = (k.p)} ^Above^^ {k.p) . 

Thus, B{{k.p)) contains the substates that, prefixed by a substate in A{{k.p)), 
form a (global) state encoded by the MDD. We reserve the indices 0 and 1 at 
each level k to encode the sets 0 and x • • • x , respectively. In particular, 
jB(( 0.O)) = 0 and >^((0.1)) = {()}. Fig. 1 shows a four-level example MDD and 
the set S encoded by it; only the highlighted nodes are actually stored. 
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5 " = { 0 , 1 , 2 , 3 } 
< 5 * = { 0 , 1 , 2 } 
.5" = {0,1} 

51 = {0,1, 2} 



[oBp] 



0 |0|1|2| I 0 I 1 I 2 I I 0 I 1 U I 




0 |0| 1 |2| lomzl |0|1 |2| l 



5 = { 1000 , 1010 , 1100 , 
1110 , 1210 , 2000 , 
2010 , 2100 , 2110 , 
2210,3010,3110, 
3200,3201,3202, 
3210,3211,3212} 



Fig. 1. An example MDD and the state space S encoded by it. 



Many algorithms for generating state spaces nsing BDDs exist [20], which can 
easily be adapted to MDDs. In contrast to those, however, onr approach does not 
encode the next-state fnnction as an MDD over 2K variables, describing the K 
state components before and after a system step. Instead, we npdate MDD nodes 
directly, adding the new states reached throngh one step of the global next-state 
fnnction when firing some event. For asynchronons systems, this fnnction is often 
expressible as the cross-prodnct of local next-state fnnctions. 

Product— form behavior. An asynchronons system model exhibits snch be- 
havior if, for each event e, its next-state fnnction J\fe can be written as a cross- 
prodnct of K local fnnctions, i.e., J\fe = x • • • x where ^ 2 ^^ , 

for K y k > 1. (Recall that event e is disabled in some global state exactly if it 
is disabled in at least one component.) The prodnct-form reqnirement is qnite 
natnral. First, many modeling formalisms satisfy it, e.g., any Petri net model 
conforms to this behavior for any partition of its places. Second, if a given model 
does not respect the prodnct-form behavior, we can always coarsen K or rehne S 
so that it does. As an example, consider a model partitioned into fonr snbmodels, 
where J\fe = Af^ x x J\f^ , bnt x — > 2^ cannot be expressed 

as a prodnct J\f^ x J\f^ . We can achieve the prodnct-form reqnirement by simply 
partitioning the model into three, not fonr, snbmodels. Alternatively, we may 
snbstitnte event e with “snbevents” satisfying the prodnct form. This is possible 
since, in the worst case, we can define a snbevent e* j, for each i = and 

i = if,f) e with 7V'e,_^(C) = {f} and Ve,_^(C) = {f}. 

Finally, we introdnce some notational conventions. We say that event e de- 
pends on level k, if the local state at level k does affect the enabling of e or if it 
is changed by the firing of e. Let First (e) and Last{e) be the first and last levels 
on which event e depends. Events e snch that First{e) = Last{e) = k are said 
to be local events] we merge these into a single macro-event withont violat- 
ing the prodnct-form reqnirement, since we can write Ayfc = Af^k x • • • x Af^k 
where = [j{c:F^rst{c)=Last{e)=k } while = {i'} for I jL k and 

G SK The set |e G ^ : First{e) = k} of events “starting” at level k is denoted 
by . We also extend Afe to substates instead of full states: . . . ,i^)) = 

Af^{i^) X • • • X for /F > k > I > 1; to sets of states: We(T) = 

for T C 5^ X • • • X 5^ ; and to sets of events: AV(T) = for F C S, 

In particular, we write Af<k as a shorthand for Af{e:Ftrst{e)<k}- 
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3 A Novel Algorithm Employing Node Saturation 

In the following we refer to a specific order of iterating the local next-state fnnc- 
tions of an synchronons system model as iteration strategy. Clearly, the choice 
of strategy inflnences the efficiency of state-space generation. In onr previons 
work [10] we employed a naive strategy that cycled throngh MDDs level-by- 
level and fired, at each level k, all events e with First [e) = k. 

As main contribntion of this paper, we present a novel iteration strategy, 
called saturation, which not only simplifies onr previons algorithm, bnt also 
significantly improves its time and space efficiency. The key idea is to fire events 
node-wise and exhanstively, instead of level-wise and jnst once per iteration. 
Formally, we say that an MDD node (k.p) is saturated if it encodes a set of 
states that is a fixed point with respect to the firing of any event at its level 
or at a lower level, i.e., if B{{k.p)) = holds; it can easily be 

shown by contradiction that any node below node (k.p) mnst be satnrated, too. 
It shonld be noted that the rontine for firing some event, in order to reveal 
and add globally reachable states to the MDD representation of the state space 
nnder constrnction, is similar to [10]. In particnlar, MDDs are manipnlated only 
locally with respect to the levels on which the fired event depends, and, dne to the 
prodnct-form behavior, these manipnlations can be carried ont very efficiently. 
We do not fnrther comment on these issnes here, bnt concentrate solely on the 
new idea of node satnration and its implications. 

Jnst as in traditional symbolic state-space generation algorithms, we use 
a unique table, to detect duplicate nodes, and operation caches, in particular 
a union cache and a firing cache, to speed-up computation. However, our ap- 
proach is distinguished by the fact that only saturated nodes are checked in the 
unique table or referenced in the caches. Given the MDD encoding of the ini- 
tial state s, we saturate its nodes bottom-up. This improves both memory and 
execution-time efficiency for generating state spaces because of the following 
reasons. First, our saturation order ensures that the firing of an event affecting 
only the current and possibly lower levels adds as many new states as possible. 
Then, since each node in the final encoding of S is saturated, any node we insert 
in the unique table has at least a chance of being part of the final MDD, while 
any unsaturated node inserted by a traditional symbolic approach is guaranteed 
to be eventually deleted and replaced with another node encoding a larger subset 
of states. Finally, once we saturate a node at level k, we never need to fire any 
event e E in it again, while, in classic symbolic approaches, J\f is applied to 
the entire MDD at every iteration. 

In the pseudo-code of our new algorithm implementing node saturation, 
which is shown in Fig. 2, we use the data types evnt (model event), Icl (local 
state), Ivl (level), and idx (node index within a level); in practice these are simply 
integers in appropriate ranges. We also assume the following dynamically-sized 
global hash tables: (a) UT[k], for K>k>l, the unique table for nodes at level k, 
to retrieve p given the key {k.p)[0], . . . , {k.p)[n^ — 1]; (b) UC[k], for K > k>l, 
the union cache for nodes at level k, to retrieve s given nodes p and q, where 
B{{k.s)) = B{{k.p)) U B{{k.q)); and (c) FC[k], for K > k > 1, the firing cache 
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for nodes at level k, to retrieve s given node p and event e, where First (e) > k 
and B{{k.s)) = J\f^}^{J\fe{B{{k.p)))). Fnrthermore, we nse K dynamically-sized 
arrays to store nodes, so that (k.p) can be efficiently retrieved as the p^^ entry of 
the k^^ array. The call Generate(s) creates the MDD encoding the initial state, 
satnrating each MDD node as soon as it creates it, in a bottom-np fashion. 
Hence, when it calls Saturate{k,r), all children of (k,r) are already satnrated. 

Theorem 1 (Correctness). Consider a node (k.p) with K > k > 1 and satu- 
rated children. Moreover^ (a) let (l.q) be one of its children, satisfying q 0 and 
I = k — 1; (b) let Id stand for B {{l.q)) before the call RecFire{eJ,q), for some 
event e with I < First {e), and let V represent B{{l.f)), where f is the value re- 
turned by this call; and (c) let X and y denote B{{k.p)) before and after calling 
Saturate{k,p), respectively. Then, (i) V = J\ffi{J\fe{ld)) and (ii) y — . 

By choosing, for node {k.p), the root {K.r) of the MDD representing the initial 
system state s, we obtain y — J\ff_j^{B{{K.r))) — A^<x({®}) = desired. 

Proof. To prove both statements we employ a simnltaneons indnction on k. For 
the indnction base, k = 1, we have: (i) The only possible call RecFire{e, 0 , 1 ) im- 
mediately retnrns 1 becanse of the test on I (cf. line 1 ). Then, U — V — {{)} and 
{()} = A^<o(A^e ({()}))• (ii) The call Saturate{l,p) repeatedly explores A^, the 
only event in in every local state i for which {i) 7 ^ 0 and for which (l.p)[i] 
is either 1 at the beginning of the “while C 7 ^ 0 ” loop, or has been modi- 
fied (cf. line 12 ) from 0 to 1 , which is the value of /, hence u, since the call 
RecFire{e,0, 1 ) returns 1 . The iteration stops when further attempts to fire 
do not add any new state to B{{l.p)). At this point, y = — Mfi{X). 

For the induction step we assume that the calls to Saturate{k — 1, ^ as well 
as to RecFire{e,l — l, •) work correctly. Recall that I — k —1. 

(i) Unlike Saturate (cf. line 14), RecFire does not add further local states to C, 
since it modifies “in-place” the new node {l.s), and not node {l.q) describing 
the states from where the firing is explored. The call RecFire{e,l, q) can be 
resolved in three ways. If I < Last{e), then the returned value is f = q and 
J\fl{ld) = Id for any set Id] since q is saturated, B{{l.q)) = J^^i{B{{l.q))) = 
J\ffi{J\fe{B{{l.q)))). If I > Last{e) but RecFire has been called previously 
with the same parameters, then the call Find{FC[l], {q, e} , s) is success- 
ful. Since node q is saturated and in the unique table, it has not been 
modified further; note that in-place updates are performed only on nodes 
not yet in the unique table. Thus, the value s in the cache is still valid 
and can be safely used. Finally, we need to consider the case where the 
call RecFire{e,l,q) performs “real work.” First, a new node {l.s) is cre- 
ated, having all its arcs initialized to 0 . We explore the firing of e in each 
state i satisfying {l-q)[i] 7 ^ 0 and Aff{i) 7 ^ 0. By induction hypothesis, the 
recursive call RecFire{e,l — 1, {I .q)[i]) returns jFfi_^{jFe{B{{l — l.{l.q)[i])))). 
Hence, when the “while C 7 ^ 0” loop terminates, B{{l.s)) = vi(o X 

J^<i-i{J^e{B{{l — l.{l.q)[i])))) = J\ffi_^{J\fe{B{{l.q)))) holds. Thus, all children 
of node {l.s) are saturated. According to the induction hypothesis, the call 
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Satumte{l ^ s) correctly saturates (l.s). Consequently, we have B{{l.s)) — 

- J\f!^i{Me{B{{Lq)))) after the call. 

(ii) As in the base case, Satumte{k ^p) repeatedly explores the firing of each 
event e that is locally enabled in i ^ ] it calls RecFire[e^ k—1^ {^-P)[^]) that, 

as shown above and since l — k—1, returns J\f ^^_^{J\fe[B{{k — 1 ,{k .p)[i])) j) . Fur- 
ther, Saturate(k^p) terminates when firing the events in = {ei , 62 , . . 
does not add any new state to B{{k.p)). At this point, the set y encoded 
by (k.p) is the fixed-point of the iteration 

^ u ■ ■ 0) 

initialized with A. Hence, y = as desired. □ 




Fig. 3. Example of the execution of the Saturate and RecFire routines. 



Fig. 3 illustrates our saturation-based state-space generation algorithm on a 
small example, where K — 3, \Ss\ = 2, \S2\ = 3, and \Si\ = 3. The initial 
state is ( 0 , 0 , 0 ), and there are three local events /i, I 2 , and Is, plus two further 
events, 621 (depending on levels 2 and 1 ) and 6321 (depending on all levels). 
Their effects, i.e., their next-state functions, are summarized in the table at the 
top of Fig. 3; the symbol indicates that a level does not affect an event. The 
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MDD encoding {(0, 0, 0)} is displayed in Snapshot (a). Nodes (3.2) and (2.2) are 
actnally created in Steps (b) and (g), respectively, bnt we show them from the 
beginning for clarity. The level Ivl of a node {Ivl.idx) is given at the very left 
of the MDD fignres, whereas the index idx is shown to the right of each node. 
We nse dashed lines for newly created objects, donble boxes for satnrated nodes, 
and shaded local states for snbstates enabling the event to be fired. We do not 
show nodes with index 0 , nor any arcs to them. 

— Snapshots (a-h): The call Saturate{1^2) npdates node (1.2) to represent the 
effect of firing ; the resnlt is eqnal to the reserved node ( 1 . 1 ). 

— Snapshots (b-f): The call Saturate{2,2) fires event I 2 , adding arc (2.2)[1] 
to (1.1) (cf. Snapshot (c)). It also fires event 621 which finds the “enabling 
pattern” (*, 0 , 1 ), with arbitrary first component, and starts bnilding the 
resnlt of the firing, throngh the seqnence of calls RecFire{e 2 i, 1 , ( 2 . 2 ) [ 0 ]) and 
i?ecT/re(e 2 i, 0, (1.1)[1]). Once node (1.3) is created and its arc (1.3)[0] is 
set to 1 (cf. Snapshot (d)), it is satnrated by repeatedly firing event li. 
Node (1.3) then becomes identical to node (1.1) (cf. Snapshot (e)). Hence, 
it is not added to the nniqne table bnt deleted. Retnrning from RecFire on 
level 1 with resnlt ( 1 . 1 ), arc ( 2 . 2 )[ 1 ] is npdated to point to the ontcome of 
the firing (cf. Snapshot (f)). This does not add any new state to the MDD, 
since { 1 } x { 0 } was already encoded in B{{2.2)). 

— Snapshots (f~o): Once (2.2) is satnrated, we call Saturate{3, 2). Local event Is 
is not enabled, bnt event 6321 is, by the pattern (0, 0, 0). The calls to RecFire 
build a chain of nodes encoding the result of the firing (cf. Snapshots (g~i)). 
Each of them is in turn saturated (cf. Snapshots (h-j)), causing first the 
newly created node (1.4) to be deleted, since it becomes equal to node (1.1), 
and second the saturated node (2.3) to be added to the MDD. The firing 
of 6321 (cf. Snapshot (k)) not only adds state (1,2,1), but the entire sub- 
space {1} X {1, 2} X now known to be exhaustively explored, as node (2.3) 
is marked saturated. Event Is, which was found disabled in node (3.2) at the 
first attempt, is now enabled, and its firing calls Umon{2, (3.2) [1], (3.2) [0]). 
The result is a new node which is found by Check to be the reserved 
node (2.1) (cf. Snapshot (m)). This node encoding S 2 x Si is added as the de- 
scendant of node (3.2) in position 0, and the former descendant (2.2) in that 
position is removed (cf. Snapshot (n)), causing it to become disconnected 
and deleted. Eurther attempts to fire events Is or 6321 add no more states to 
the MDD, whence node (3.2) is declared saturated (cf. Snapshot (o)). Thus, 
our algorithm terminates and returns the MDD encoding of the overall state 
space ({0} X 5^ X S^) U ({1} x {1,2} x S^). 

To summarize, since MDD nodes are saturated as soon as they are created, 
each node will either be present in the final diagram or will eventually become 
disconnected, but it will never be modihed further. This reduces the amount 
of work needed to explore subspaces. Once all events in are exhaustively 
hred in some node (k.p), any additional state discovered that uses (k.p) for its 
encoding benefits in advance from the “knowledge” encapsulated in (k.p) and 
its descendants. 
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4 Garbage Collection and Optimizations 

Garbage collection. MDD nodes can become disconnected, i.e., nnreachable 
from the root, and shonld be “recycled.” Disconnection is detected by associating 
an incoming-arc counter to each node {k.p). Recycling disconnected nodes is a 
major issne in traditional symbolic state-space generation algorithms, where 
nsnally many nodes become disconnected. In onr algorithm, this phenomenon is 
much less frequent, and the best runtime is achieved by removing these nodes 
only at the end; we refer to this policy as Lazy policy. 

We also implemented a Strict policy where, if a node (k.p) becomes discon- 
nected, its “delete-flag” is set and its arcs {k.p)[i] are re-directed to — 1.0), 
with possible recursive effects on the nodes downstream. When a hit in the union 
cache UC[k] or the firing cache FC[k] returns s, we consider this entry stale if the 
delete-flag of node (k.s) is set. By keeping a per-level count of the nodes with 
delete-flag set, we can decide in routine NewNode{k) whether to (a) allocate new 
memory for a node at level k or (b) recycle the indices and the physical memory 
of all nodes at level k with delete-flag set, after having removed all the entries 
in UC[k] and FC[k] referring to them. The threshold that triggers recycling 
can be set in terms of number of nodes or bytes of memory. The policy using 
a threshold of one node, denoted as Strict(I), is optimal in terms of memory 
consumption, but has a higher overhead due to more frequent clean-ups. 

Optimizations. First, observe that the two outermost loops in Saturate ensure 
that firing some event e E does not add any new state. If we always consider 
these events in the same order, we can stop iterating as soon as \S^\ consecutive 
events have been explored without revealing any new state. This saves \S^\/2 br- 
ing attempts on average, which translates to speed-ups of up to 25% in our ex- 
perimental studies. Also, in Union, the call Insert{UC[k], {p,q},s) records that 
B{{k.s)) = B{{k.p)) UB{{k.q)). Since this implies B{{k.s)) = B{{k.p)) UB{{k.s)) 
and B{{k.s)) = B{{k.s)) U B{{k.q)), we can, optionally, also issue the calls 
Insert {U C[k], {p, s}, s), if s ^ p, and Insert {U C[k], {q, s}, s), if s ^ q. This 
speculative union heuristic improves performance by up to 20%. 



5 Experimental Results 

In this section we compare the performance of our new algorithm, using both the 
Strict and Lazy policies, with previous MDD-based ones, namely the tradi- 
tional Recursive MDD approach in [22] and the level-by-level Forwarding- 
arcs approach in [10]. All three approaches are implemented in SMART [11], a 
tool for the logical and stochastic-timing analysis of discrete-state systems. For 
asynchronous systems, these approaches greatly outperform the more traditional 
BDD-based approaches [20], where next-state functions are encoded using de- 
cision diagrams. To evaluate our saturation algorithm, we have chosen a suite of 
examples with a wide range of characteristics. In all cases, the state space sizes 
depend on a parameter TV G N. 
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— The classic N queens problem requires to find a way to position N queens on 
a TV X TV chess board such that they do not attack each other. Since there will 
be exactly one queen per row in the final solution, we use a safe (i.e., at most 
one token per place) Petri net model with TV x TV transitions and TV rows, 
one per MDD level, of TV + 1 places. For 1 < i,j < TV, place pij is initially 
empty, and place pio contains the token (queen) still to be placed on row i of 
the chess board. Transition tij moves the queen from place pio to place pij, 
in competition with all other transitions tu, for I j. To encode the mutual 
exclusion of queens on the same column or diagonal, we employ inhibitor 
arcs. A correct placement of the TV queens corresponds to a marking where 
all places pio are empty. Note that our state space contains all reachable 
markings, including those where queens n to TV still need to be placed, for 
any n. In this model, locality is poor, since tij depends on levels 1 through i. 

— The dining philosophers and slotted ring models [10, 25] are obtained by 
connecting TV identical safe subnets in a circular fashion. The MDD has 
TV/2 MDD levels (two subnets per level) for the former model and TV levels 
(one subnet per level) for the latter. Events are either local or synchronize 
adjacent subnets, thus they span only two levels, except for those synchro- 
nizing subnet TV with subnet 1, which span the entire MDD. 

— The round-robin mutex protocol model [17] also has TV identical safe subnets 
placed in a circular fashion, which represent TV processes, each mapped to 
one MDD level. Another subnet models a resource shared by the TV processes, 
giving raise to one more level, at the bottom of the MDD. There are no local 
events and, in addition to events synchronizing adjacent subnets, the model 
contains events synchronizing levels n and 1, for 2 < n < TV -h 1. 

— The flexible manufacturing system (FMS) model [22] has a fixed shape, but is 
parameterized by the initial number TV of tokens in some places. We partition 
this model into 19 subnets, giving rise to a 19-level MDD with a moderate 
degree of locality, as events span from two to six levels. 

Fig. 4 compares three variants of our new algorithm, using the Lazy policy or the 
Strict policy with thresholds of 1 or 100 nodes per level, against the Recursive 
algorithm in [22] and the FORWARDING algorithm in [10]. We ran SMART on a 
800 MHz Intel Pentium III PC under Linux. On the left column. Fig. 4 reports 
the size of the state space for each model and value of TV. The graphs in the 
middle and right columns show the peak and final number of MDD nodes and 
the CPU time in seconds required for the state-space generations, respectively. 

For the models introduced above, our new approach is up to two orders of 
magnitude faster than [22] (a speed-up factor of 384 is obtained for the 1000 din- 
ing philosophers’ model), and up to one order of magnitude faster than [10] (a 
speed-up factor of 38 is achieved for the slotted ring model with 50 slots). These 
results are observed for the Lazy variant of the algorithm, which yields the best 
runtimes; the Strict policy also outperforms [22] and [10]. Furthermore, the 
gap keeps increasing as we scale up the models. Just as important, the satura- 
tion algorithm tends to use many fewer MDD nodes, hence less memory. This 
is most apparent in the FMS model, where the difference between the peak and 
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Fig. 4. State-space sizes, memory consumption, and generation times (a log- 
arithmic scale is used on the y-axis for the latter). Note that the curves in the 
upper left diagram are almost identical, thus they visually coincide. 
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the final number of nodes is just a constant, 10, for any Strict policy. Also 
notable is the reduced memory consumption for the slotted ring model, where 
the Strict(I) policy uses 23 times fewer nodes compared to [22], for N = 50. 
In terms of absolute memory requirements, the number of nodes is essentially 
proportional to bytes of memory. For reference, the largest memory consumption 
in our experiments using saturation was recorded at 9.7MB for the FMS model 
with 100 tokens; auxiliary data structures required up to 2.5MB for encoding 
the next-state functions and 200KB for storing the local state spaces, while the 
caches used less than 1MB. Other SMART structures account for another 4MB. 

In a nutshell, regarding generation time, the best algorithm is Lazy, fol- 
lowed by Strict(IOO), Strict(I), Forwarding, and Recursive. With re- 
spect to memory consumption, the best algorithm is Strict(I), followed by 
Strict(IOO), Lazy, Forwarding, and Recursive. Thus, our new algorithm 
is consistently faster and uses less memory than previously proposed approaches. 
The worst model for all algorithms is the queens problem, which has a very large 
number of nodes in the final representation of S and little locality. Even here, 
however, our algorithm uses slightly fewer nodes and is substantially faster. 

6 Related Work 

We already pointed out the significant differences of our approach to symbolic 
state-space generation when compared to traditional approaches reported in the 
literature [20], which are usually deployed for model checking [12]. Hence, for a 
fair comparison, we should extend our algorithmic implementation to that of a 
full model checker first. Doing this is out of the scope of the present paper and 
is currently work in progress. 

The following paragraphs briefly survey some orthogonal and alternative ap- 
proaches to improving the scalability of symbolic state-space generation and 
model-checking techniques. Regarding synchronous hardware systems, symbolic 
techniques using BDDs, which can represent state spaces in sublinear space, have 
been thoroughly investigated. Several implementations of BDDs are available; 
we refer the reader to [27] for a survey on BDD packages and their perfor- 
mance. To improve the time efficiency of BDD-based algorithms, breadth-first 
BDD-manipulation algorithms [4] have been explored and compared against 
the traditional depth-first ones. However, the results show no significant speed- 
ups, although breadth-first algorithms lead to more regular access patterns of 
hash tables and caches. Regarding space efficiency, a fair amount of work has 
concentrated on choosing appropriate variable orderings and on dynamically re- 
ordering variables [15]. 

For asynchronous software systems, symbolic techniques have been investi- 
gated less, and mostly only in the setting of Petri nets. For safe Petri nets, BDD- 
based algorithms for the generation of the reachability set have been developed 
in [25] via encoding each place of a net as a Boolean variable. These algorithms 
are capable of generating state spaces of large nets within hours. Recently, more 
efficient encodings of nets have been introduced, which take place invariants into 




Saturation: An Efficient Iteration Strategy 



341 



account [24], although the underlying logic is still based on Boolean variables. 
In contrast, our work uses a more general version of decision diagrams, namely 
MDDs [18, 22], where more complex information is carried in each node of a dia- 
gram. In particular, MDDs allow for a natural encoding of asynchronous system 
models, such as distributed embedded systems. 

For the sake of completeness, we briefly mention some other BDD-based tech- 
niques exploiting the component-based structure of many digital systems. They 
include partial model checking [3], compositional model checking [19], partial- 
order reduction [2], and conjunctive decompositions [21]. Finally, also note that 
approaches to symbolic verification have been developed, which do not rely on 
decision diagrams but instead on arithmetic or algebra [1, 6, 26]. 

7 Conclusions and Future Work 

We presented a novel approach for constructing the state spaces of asynchronous 
system models using MDDs. By avoiding to encode the global next-state func- 
tion as an MDD, but splitting it into several local next-state functions instead, 
we gained the freedom to choose the sequence of event firings, which controls the 
fixed-point iteration resulting in the desired global state space. Our central con- 
tribution is the development of an elegant iteration strategy based on saturating 
MDD nodes. Its utility is proved by experimental studies which show that our 
algorithm often performs several orders of magnitude faster than most existing 
algorithms. Equally important, the peak size of the MDD is usually kept close 
to its hnal size. 

Regarding future work, we plan to employ our idea of saturation for imple- 
menting an MDD-based CTL model checker within SMART [11], to compare 
that model checker to state-of-the-art BDD-based model checkers, and to test 
our tool on examples that are extracted from real software. 
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Abstract. Testing is the most dominating validation activity used by industry 
today, and there is an urgent need for improving its effectiveness, both with re- 
spect to the time and resources for test generation and execution, and obtained 
test coverage. We present a new technique for automatic generation of real-time 
black-box conformance tests for non-deterministic systems from a determinizable 
class of timed automata specifications with a dense time interpretation. In con- 
trast to other attempts, our tests are generated using a coarse equivalence class 
partitioning of the specification. To analyze the specification, to synthesize the 
timed tests, and to guarantee coverage with respect to a coverage criterion, we 
use the efficient symbolic techniques recently developed for model checking of 
real-time systems. Application of our prototype tool to a realistic specification 
shows promising results in terms of both the test suite size, and the time and 
space used for test generation. 



1 Background 

Testing consists of executing a program or a physical system with the intention of find- 
ing undiscovered errors. In typical industrial projects, as much as a third of the total 
development time is spent on testing, and it therefore constitutes a significant portion of 
the cost of the product. Since testing is the most dominating validation activity used by 
industry today, there is an urgent need for improving its effectiveness, both with respect 
to the time and resources used for test generation and execution, and obtained coverage. 

A potential improvement that is being examined by researchers is to make testing a 
formal method, and to provide tools that automate test case generation and execution. 
This approach has experienced some level of success: Formal specification and auto- 
matic test generation are being applied in practice [7, 20, 23, 26], and commercial test 
generations tools are emerging [17, 24]. Typically, a test generation tool inputs some 
kind of finite state machine description of the behavior required of the implementation. 
A formalized implementation relation describes exactly what it means for an implemen- 
tation to be correct with respect to a specification. The tool interprets the specification 
or transforms it to a data structure appropriate for test generation, and then computes 
a set of test sequences. Since exhaustive testing is generally infeasible, it must select 
only a subset of tests for execution. Test selection can be based on manually stated test 
purposes, or on a coverage criterion of the specification or implementation. 
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However, these tools do not address real-time systems, or only provide a limited 
support of testing the timing aspects. They often abstract away the actual time at which 
events are supplied or expected, or does not select these time instances thoroughly and 
systematically. To test real-time systems, the specification language must be extended 
with constructs for expressing real-time constraints, the implementation relation must 
be generalized to consider the temporal dimension, and the data structures and algo- 
rithms used to generate tests must be revised to operate on a potentially infinite set of 
states. Further, the test selection problem is worsened because a huge number of time 
instances are relevant to test. It is therefore necessary to make good decisions of when 
to deliver an input to the system, and when to expect an output. Since real-time systems 
are often safety critical, the time dimension must be tested thoroughly and systemati- 
cally. Automated test generation for real-time systems is a fairly new research area, and 
only few proposals exist that deal with these problems. 

This paper presents a new technique for automatic generation of timed tests from 
a restricted class of dense timed automata specifications. We permit both non-deter- 
ministic specifications and (black-box) implementations. Our implementation relation 
is therefore based on Hennessy’s classical testing theory [21] for concurrent systems, 
which we have generalized to take time into account. We propose to select test cases 
by partitioning the state space into coarse grained equivalence classes which preserve 
essential timing and deadlock information, and select a few tests for each class. This 
approach is inspired by sequential black-box testing techniques frequently referred to 
as domain- or partition testing [3]. We regard the clocks of a timed specification as 
(oddly behaving) input parameters. 

We present an algorithm and data structure for systematically generating timed Hen- 
nessy tests. The algorithm ensures that the specification will be covered such that the rel- 
evant Hennessy tests for each reachable equivalence class will be generated. To compute 
and cover the reachable equivalence classes, and to compute the timed test sequences, 
we employ efficient symbolic reachability techniques based on constraint solving that 
have recently been developed for model checking of timed automata [15, 6, 28, 4, 18]. 
In summary, the contributions of the paper are: 

- We propose a coarse equivalence class partitioning of the state space and use this 
for automatic test selection. 

- Other work on test generation for real-time systems allows deterministic specifi- 
cations only, and use trace inclusion as implementation relation. We permit both 
non- deterministic specifications and (black-box) implementations, and use an im- 
plementation relation based on Hennessys testing theory that takes deadlocks into 
account. 

- Application of the recently developed symbolic reachability techniques has to our 
knowledge not previously been applied to test generation. 

- Our techniques are implemented in a prototype test generation tool, RTCAT. 

- We provide experimental data about the efficiency of our technique. Application of 
RTCAT to one small and one larger case study results in encouragingly small test 
suites. 

The remainder of the paper is organized as follows. Section 2 summarizes the re- 
lated work. Section 3 introduces Hennessy tests, the specification language, and the 
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symbolic reachability methods. Section 4 presents the test generation algorithm. Sec- 
tion 5 contains our experimental results. Section 6 concludes the paper and suggests 
future work. 



2 Related Work 

Springintveld et al. proved in [27] that exhaustive testing wrt. trace equivalence of 
deterministic timed automata with a dense time interpretation is theoretically possible, 
but highly infeasible in practice. Another result generating checking sequences for a 
discretized deterministic timed automaton is presented by En-Nouaary et al. in [16]. 
Although the required discretization step size (1/(|X| + 2), where |X| is the number 
of clocks) in [16] is more reasonable than [27], it still appears to be too small for most 
practical applications because too many tests are generated. Both of these techniques are 
based on the so called region graph technique due to Alur and Dill [1]. Clock regions 
are very fine-grained equivalence classes of clock valuations. We argue that coarser 
partitions are needed in practice. Further, our equivalence class partitioning as well as 
the used symbolic techniques are much less sensitive to the clock constants and the 
number of clocks appearing in the specification compared to the region construct. 

Cardell-Oliver and Glover showed in [9] how to derive checking sequence from a 
discrete time, deterministic, timed transition system model. Their approach is imple- 
mented in a tool which is applied to a series of small cases. Their result indicates that 
the approach is feasible, at least for small systems, but problems arise if the implemen- 
tation has more states than the specification. No test selection wrt. the time dimension 
is performed, i.e., an action is taken at all the time instances it is enabled. 

Clarke and Lee [11, 12] also propose domain testing for real-time systems. Al- 
though their primary goal of using testing as a means of approximating verification to 
reduce the state explosion problem is different from ours, their generated tests could 
potentially be applied to physical systems as well. Their technique appear to produce 
much fewer tests than region based generation. The time requirements are specified 
as directed acyclic graphs called constraint graphs. Compared to timed automata this 
specification language appear very restricted, e.g., since their constraint graphs must 
be acyclic this only permits specification of finite behaviors. Their domains are “nice” 
linear intervals which are directly available in the constraint graph. In our work they are 
(convex) polyhedra of a dimension equal to the number of clocks. 

Braberman et al. [8] describe an approach where a structured analysis/structured 
design real-time model is represented as a timed Petri net. Analysis methods for timed 
Petri nets based on constraint solving can be used to generate a symbolic timed reach- 
ability tree up to a predefined time bound. From this, specific timed test sequences can 
be chosen. This work shares with ours the generation of tests from a symbolic represen- 
tation of the state space. We guarantee coverage according to a well defined criterion 
without reference to a predefined or explicitly given upper time bound. The paper also 
proposes other selection criteria, mostly based on the type and order of the events in the 
trace. However, they are concerned with generating traces only, and not on deadlock 
properties as we are. The paper describes no specific data structures or algorithms for 
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constraint solving, and states no results regarding their efficiency. Their approach does 
not appear to be implemented. 

Castanet et al. presents in [10] an approach where timed test traces can be gener- 
ated from timed automata specifications. Test selection must be done manually through 
engineerer specified test purposes (one for each test) themselves given as deterministic 
acyclic timed automata. Such explicit test selection reduces the state explosion prob- 
lem during test generation, but leaves a significant burden on the engineer. Further, the 
test sequences appear to be synthesized from paths available directly in an intermedi- 
ate timed automaton formed by a synchronous product of the specification and the test 
purpose, and not from a (symbolic) interpretation thereof. This approach therefore risks 
generating tests which need not be passed by the implementation, or not finding a test 
satisfying the test purpose when one in fact exists. 

Finally, test generation from a discrete time temporal logic is investigated by [20] . 



3 Preliminaries 

3.1 Hennessy Tests 

In Hennessy ’s testing theory [21] specifications S are defined as finite state labelled 
transition systems over a given finite set of actions Act. Also, it assumes that imple- 
mentations X (and specifications) can be observed by finite tests T via a sequence 
of synchronous CCS-like communications. So, the execution of a test consists of a 
finite sequence of communications forming a so-called computation — denoted by 
Comp[T II X) (or Comp[T || 5)). A test execution is assigned a verdict (pass, fail 
or inconclusive), and a computation is successful if it terminates after an observation 
having the verdict pass. 

Hennessy tests have the following abstract syntax £tits- (1) after a must A, (2) 
can (j, and (3) after cr must 0, where cr G Acf^ and A C Act. Informally, (1) is 
successful if at least one of the observations in A (called a must set) can be observed 
whenever the trace a is observed, (2) is successful if cr is a prefix of the observed system, 
and (3) is successful is this is not the case (i.e. cr is not a prefix). 

Definition 1. The Testing Preorder Ete- 



1. S must T iff Vi7 G Comp{T || 5). 17 is successful. 

2. S may T iff 3i7 G Comp{T || 5). 17 is successful. 

3. S Emust ^ iff VT G /^tits- S must T implies X must T 

4. S Einay X iff VT G /^tits- ^ may T implies X may T 

5. 5 Ete X iff S E^ust X and 5 E^^y X 

□ 

Specifications and implementations are compared by the tests they pass. The must 
(may) preorder requires that every test that must (may) be passed by the specification 
must (may) also be passed by the implementation. In non-deterministic systems these 
notions do not coincide. The testing preorder defined formally in Definition 1 requires 
satisfaction on both the must and may preorders. 
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A must test after a must A can be generated from a specification by 1) find- 
ing a trace cr in the specification, 2 ) computing the states that are reachable after that 
trace, and 3) computing a set of actions A that must be accepted in these states. To 
facilitate and ease systematic generation of all relevant tests, the specification can be 
converted to a success graph (or acceptance graph [13]) data structure. A success graph 
is a deterministic state machine trace equivalent to the specification, and whose nodes 
are labeled with the must sets holding in that node, the set of actions that are possible, 
and the actions that must be refused. 

We propose a simple timed generalization of Hennessy’s tests. In a timed test 
after cr must A (or after cr must 0), cr becomes a timed trace (a sequence of alternat- 
ing actions and time delays), after which an action in A must be accepted immediately. 
Similarly, a test can cr (after cr must 0) becomes a timed trace satisfied if cr is (is not) 
a prefix trace of the observed system. A test will be modelled by an executable timed 
automaton whose locations are labelled with pass, fail, or inconclusive verdicts. 



3.2 Event Recording Automata 

Two of the surprising undecidability results from the theoretical work on timed lan- 
guages described by timed automata is that 1) a non- determini Stic timed automaton 
cannot in general be converted into a deterministic (trace) equivalent timed automaton, 
and 2 ) trace (language) inclusion between two non-deterministic timed automata is un- 
decidable [2]. Thus, unlike the untimed case, deterministic and non-deterministic timed 
automata are not equally expressive. The Event Recording Automata model (ERA) was 
proposed by Alur, Eix, and Henzinger in [2] as a determinizable subclass of timed au- 
tomata, which enjoys both properties. 

Definition 2. Event Recording Automaton: 



1. An ERA is a tuple {Act, TV, Iq , E) where Act is the set of actions, TV is a (finite) 
set of locations, Iq ^ N is the initial location, and E C N x G{X) x Act x TV is 
the set of edges. We use the term location to denote a node in the automaton, and 
reserve the term state to denote the semantic state of the automaton also including 
clock values. 

2. X — {xa \ a ^ Act} is the set of clocks. The guards G{X) are generated by the 

syntax ^ 7 | ^ A ^ where 7 is a constraint of the form x\ ^ c or x\ — X 2 ^ c 

with {<,<,=,>,>},ca non-negative integer constant, and X\,X 2 G A. 

□ 

Like a timed automaton, an ERA has a set of clocks which can be used in guards 
on actions, and which can be reset when an action is taken. In ERAs however, each 
action a is uniquely associated with a clock called the event clock of a. Whenever 
an action a is executed, the event clock Xa is automatically reset. No further clock as- 
signments are permitted. The event clock Xa thus records the amount of time passed 
since the last occurrence of a. In addition, no internal r actions are permitted. These 
restrictions are sufficient to ensure determinizability [2]. We shall finally also assume 
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that all observable actions are urgent meaning that synchronization between the envi- 
ronment and automaton takes place immediately when the parties have enabled a pair of 
complementary actions. With non-urgent observable actions this synchronization delay 
would be unbounded. 




Fig. 3. ERA specification of a coffee vending machine (a), and determinized machine 
(b). 



Figure 3a shows an example of a small ERA. It models a coffee vending machine 
built for impatient users such as busy researchers. When the user has inserted a coin 
(coin), he must press the give button (give) to indicate his eager to get a drink. 
If he is very eager, he presses give soon after inserting the coin, and the vending 
machine outputs thin coffee (thinCof ); apparently, there is insufficient time to brew 
good coffee. If he waits more than four time units, he is certain to get good coffee ( co f ). 
If he presses give after exactly four time units, the outcome is non-deterministic. 

In a deterministic timed automata, the choice of the next edge to be taken is uniquely 
determined by the automaton’s current location, the input action, and the time the input 
event is offered. The determinization procedure for ERAs is given by [2], and is con- 
ceptually a simple extension of the usual subset construction used in the untimed case, 
only now the guards must be taken into account. Figure 3b illustrates the technique. 
Observe how the guards of the give edges from {s2} become mutually exclusive such 
that either both are enabled, or only one of them is. 

3.3 Symbolic Representation 

Timed automata with a dense time interpretation cannot be analyzed by finite state tech- 
niques, but must rather be analyzed symbolically. Efficient symbolic reachability tech- 
niques have been developed for model checking of timed automata [15, 6, 28, 4, 18]. 
Specifically, we shall employ similar techniques as those developed for the UppAal 
tool [28,4, 18]. 
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The state of a timed automaton can be represented by the pair (f, u), where I is the 
automaton’s current location (vector), and where u is the vector of its current clock 
values. A zone z is a conjunction of clock constraints of the form xi ^ c or x\ — 
X 2 ^ c with {<,<,=,>,>}, or equivalently, the solution set to these constraints. 
A symbolic state [I, z] represents a (infinite) set of states: {(I, u) | u G z}. Forward 
reachability analysis starts in the initial state, and computes the symbolic states that 
can be reached by executing an action or a delay from an existing one. When a new 
symbolic state is included in one previously visited, no further exploration of the new 
state needs to take place. Forward reachability thus terminates when no new states can 
be reached. A concrete timed trace to a given state or set of states can be computed 
by back propagating its constraints along the symbolic path used to reach it, and by 
choosing specific time points along this trace. 

Zones can be represented and manipulated efficiently by the dijference bound matrix 
(DBM) data structure. DBMs were first applied to represent clock differences by Dill 
in [15]. A DBM represents clock difference constraints of the form Xi — Xj ^ Cij by 
a(n + l) X (n + 1) matrix such that Cij equals matrix element (i, i), where n is the 
number of clocks, and ^G {<,<}• 

4 A Test Generation Algorithm 

Our equivalence class partitioning and coverage criterion are introduced in Section 4.1. 
An algorithm for constructing the equivalence classes of a specification is provided in 
Section 4.2. The test generation algorithm is presented in Section 4.3. 

4.1 State Partitioning 

Since exhaustive testing is generally infeasible, it is important to systematically select 
and generate a limited amount of tests. A test selection criterion (or coverage criterion) 
is a rule describing what behavior or requirements should be tested. Coverage is a metric 
of completeness with respect to a test selection criterion. In industrial projects it is 
highly desirable that there is such a well defined metric of the testing thoroughness, and 
that this can be measured. 

We propose a criterion based on partitioning the state space of the specification into 
coarse equivalence classes, and requiring that the test suite for each class makes a set 
of required observations of the implementation when it is expected to be in a state in 
that class. These observations are used to increase the confidence that the equivalence 
classes are correctly implemented. The partitioning and observations can be done in 
numerous ways, and some options are explored and formally defined in [22]. Given the 
partitioning stated in the following, the stable edge set criterion implemented in RTC AT 
requires that all relevant simple deadlock observations of the forms after e must A (a 
must property), after a must 0 (a refusal property), and can a (a may property) are 
made at least once in each class. 

From each control location L (recall that a location in a deterministic automaton is 
the set of locations of the original automaton that the automaton can possibly occupy 
after a given trace), the clock valuations are partitioned such that two clock valuations 
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belong to the same equivalence class iff they enable precisely the same edges from 
L, i.e. the states are equivalent wrt. the enabled edges. An equivalence class will be 
represented by a pair [L,p], where L is a set of location vectors, and p is the inequa- 
tion describing the clock constraints that must hold for that class, i.e., [L,p] is the set 
of states {{L,u) \ u G p}. Further, to obtain contiguous convex equivalence classes, 
and to reuse the existing efficient symbolic techniques, this constraint is rewritten to 
its disjunctive normal form. Each disjunct is treated as its own equivalence class. The 
partitioning from a given set of locations is defined formally in Definition 4. 



Definition 4. State partitioning^ (L): 

Let L be a set of location vectors, E{L) the set of edges starting in a location vector in 
L, E a set of edges, and E{E) = {^ | ^ I' ^ E}. Recall from Definition 2 that 
G{X) denotes the guards generated by the syntax ^ 7 | ^ A ^ where 7 is a basic 

clock constraint of the form x\ ^ cor x\ — X 2 ^ c. 

Let P be a constraint over clock inequations 7 composed using any of the logical 
connectives A, V, or -1. Let DNF(P) denote a function that rewrites constraint P to 
its equivalent disjunctive normal form, i.e., such that \/- f\j yij = P. Each conjunct 
in the disjunctive form can be written as a guard g in G{X) by appropriately negat- 
ing basic clock constraints where required. The disjunctive normal form can there- 
fore be interpreted as a disjunction of guards such that \J ■ gi — \/- f\j yij . The set 
of guards gi whose disjunction equals the disjunctive normal form is denoted GDNF, 
i.e, GDNF(P^) = {gi E G{X) \ y,gi = DNF(P^)}. 

1 . ^{L) = {Pe\E£ where Pe = /\ g A /\ 

g€r(E) g€r(E(L)-E) 

2. = U GDNF(Pb) 

PE^^iL) 



□ 

Our partitioning is based on the guards that actually occur in a specification, and is 
therefore much coarser than e.g., the region partitioning which is based on the guards 
that could possibly occur in an automaton according to the syntax in Definition 2. It 
also has the nice formal property that the states in the same equivalence class are also 
equivalent with respect the previously stated simple deadlock properties. This follows 
from the absence of r actions, and since only enabled edges, and not the precise clock 
values, affects the satisfaction of these properties. In contrast, different equivalence 
classes typically satisfy different simple deadlock properties. It is therefore natural to 
check that the implementation matches these properties for each equivalence class. Us- 
ing an even coarser partitioning is therefore likely to leave out significant timing and 
deadlock behavior. 

Each equivalence class [L,p] can now be decorated with the action sets M, G, R 
defined in Definition 5. 
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Definition 5. Decorated Equivalence Classes: 

Define Must([T, p]) = {A \ 3{L, u) G [L, p], {L, u) |= after e must A} 

Soit{[L,p\) = {a I 3{L,u) e [L,p].{L,u) A} 

1. M([L,p]) = Must[L,j9]. 

2. C{[L,p\) = Sort([L,p]). 

3. R{[L,p]) = Act — Sort([T,p]). 

□ 

M contains the sets of actions necessary to generate the must tests, C the may tests, 
and R the refusal tests for that class. Specifically, if cr is a timed trace leading to class 
[T, p], and A G M{[L,p]) then after cr must A is a test to be passed for that class. So 
is after a • a must 0 if a G R{[L,p]), and can cr • a if a G C{[L,p]). The number 
of generated tests can be further reduced by removing tests that are logically passed 
by another test. The must sets can be reduced to M([T,p]) = m/ncMust[T,p]. The 
actions observed during the execution of a must test can be removed from the may tests, 
i.e., C{[L,p]) = Sort([L,p]) - Ua€M([l,p]) 

4.2 Equivalence Class Graph Construction 

We view the state space of the specification as a graph of equivalence classes. A node in 
this graph contains an equivalence class. An edge between two nodes are labeled with 
an observable action, and represents the possibility of executing an action in a state 
in the source node, waiting some amount of time, and thereby entering a state in the 
target node. The graph is constructed by starting from an existing node [L , p] (initially 
the equivalence classes of the initial location), and then for each enabled action a, by 
computing the set of locations R that can be entered by executing the a action from the 
equivalence class. Then the partitions p' of location R can be computed according to 
Definition 4 (2). Every [L',p'] is then an a successor of [L,p]. It should be noted that 
only equivalence classes whose constraints have solutions need to be represented. The 
equivalence class graph is defined inductively in Definition 6 . This definition can easily 
be turned into an algorithm for constructing the equivalence class graph. 

Definition 6. Equivalence Class Graph: 

The nodes and edges are defined inductively as: 

1 . The set {[Lo,p] I Lo = and p 7 ^ 0} are nodes. 

2. if [L,p] is a node, so is [L\p^], and [L,p] A [L\p^] is an edge if p^ 7 ^ 0, where 
V = {P \3l £ L. I ^ P}, and p' G S^dnf (A). 

□ 

The construction algorithm implicitly determinizes the specification. The equiva- 
lence class graph preserves all timed traces of the specification, and furthermore pre- 
serves the required deadlock information for our timed Hennessy tests of the specifica- 
tion by the M, C\ and R action sets stored in each node. The non-determinism found 
in the original specification is therefore not lost, but is represented differently, and in 
a way that is more convenient for test generation: A test is composed of a trace, a 
deadlock observation possible in the specification thereafter, and associated verdicts. 
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and this information can be found simply by following a path in the equivalence class 
graph. All timed Hennessy tests that the specification passes can thus be generated from 
this graph. The explicit graph also makes it easy to ensure coverage according to the 
coverage criterion by marking the visited parts of the graph during test generation. The 
equivalence class graph for the coffee machine is depicted in Figure 7. 




Fig. 7. Equivalence class graph for the coffee machine. 



4.3 Overall Algorithm 

The equivalence class graph preserved the necessary information for generating timed 
Hennessy tests. However, it also contains behavior and states not found in the specifi- 
cation, and using such behavior will result in irrelevant and unsound tests. An unsound 
test may produce the verdict fail even when the implementation conforms to the specifi- 
cation. According to the testing preorder only tests passed by the specification should be 
generated. To ensure soundness, only the traces and deadlock properties actually con- 
tained in the specification may be used in a generated test. To find these, we therefore 
interpret the specification symbolically, and generate the timed Hennessy tests from a 
representation of only the reachable states and behavior. Moreover, the use of reach- 
ability analysis gives a termination criterion for this interpretation; when completed it 
guarantees that every reachable equivalence class is represented by some symbolic state. 
Thus, we are able to guarantee coverage by inspecting the reached symbolic states. 

Algorithm 8 presents the main steps of our generation procedure. Step 1 constructs 
the equivalence class graph as described in Section 4.1. The result of step 2 is a symbolic 
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reachability graph. Nodes in this graph consist of symbolic states [L, z/p] where L is a 
set of location vectors, and where z is a constraint characterizing a set of reachable clock 
valuations also in p, i.e., z C p. An edge represents that the target state is reachable by 
executing an action from the source state and then waiting some amount of time. 

The nodes in the reachability graph are decorated according to Definition 5 in step 3. 
The boolean flag toBeTested indicates whether test cases should be made for this sym- 
bolic state or they should be omitted. If no tests should be made, the only actions exe- 
cuted from this state will be those necessary to reach other symbolic states. Normally 
this flag would be set only the first time an equivalence class is reached during the for- 
ward reachability analysis in the previous step. Subsequent passes over the same class 
would hence be ignored. This ensures that each simple deadlock property is only gen- 
erated once per equivalence class, and thus reduces the number of produced test cases. 
Different settings of this flag permit other strategies to be easily implemented. Other 
strategies could be to test all reached symbolic states, or only test certain designated 
locations deemed critical by the user. 

Algorithm 8. Overall Test Case Generation Algorithm: 
input: ERA specification S. 

output: A complete covering set of timed Hennessy tests to be passed. 

1. Compute Sp = Equivalence Class Graph (5). 

2. Compute 5r = Reachability Graph (5p). 

3. Label every [T, z/p] G Sr with the sets M, C\ R, and boolean flag toBeTested. 

4. Traverse Sr . Eor each [L , z /p] in Sr : 

if toBeTested{[L^ ^/p\) then enumerate tests: 

(a) Choose (f, u) G [L^ z/p] 

(b) Compute a concrete timed trace a ^ Sr from {Iq , 0) to (f, u). 

(c) Make test cases to be passed: 

if A G M([T, p]) then after cr must A is a test. 

if a G C{[L^p]) then can cr • a is a test. 

if a G R{[L^p]) then after cr • a must 0 is a test. 

□ 

Step 4 contains the generation process itself. If a particular point in the symbolic 
state is of interest, such as an extreme value, this must be computed (step 4a). When a 
point has been chosen, a trace leading to it from the initial state is computed (step 4b). 
Einally, in step 4c, a test case can be generated for each of the must, may, and refusal 
properties holding in that symbolic state, and can finally be output as a test automaton 
in whatever output format is desired. 

It should be noted that the above algorithm generates individual timed Hennessy 
tests. In general, it is desirable to compose several of these properties into fewer tree 
structured tests. To facilitate test composition, the traversal and construction of test 
cases in step 4 should be done differently. A composition algorithm is implemented in 
RTCAT. Eurthermore, the graphs in steps 1 and 2 can be constructed on-the-fiy. Since 
not all equivalence classes may be reachable, this could result in a smaller graph and 
less memory use during its construction. 
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5 Experimental Results 

RTCAT accepts ERA specifications in Auto Graph format [25]. A specification may 
consist of several ERAs operating in parallel, and communicating via shared clocks and 
integer variables, but no internal synchronization is allowed as stated in Section 3.2. 
Other features are described in [22] . RTCAT occupies about 22K lines of C++ code, 
and is based on code from a simulator for timed automata (part of an old version of 
the UppAal toolkit [19]). Its Auto Graph file format parser was reused with some 
minor modifications to accommodate the ERA syntax. Also its DBM implementation 
was reused with some added operations for zone extrapolation and clock scaling. 





Fig. 9. Example tests generated from the coffee machine in Figure 3. Filled states are 
fail states, and unfilled states are pass states. Diamonds contain actions to be refused 
at the time indicated at the its top. Act is an acronym for all actions. 



Eigure 9 shows some examples of generated test cases from the coffee machine 
specification in Eigure 3 a. RTCAT has been configured to select test points in the in- 
terior of the equivalence classes. To analyze the feasibility of our techniques we have 
created an ERA version of the frequently studied Philips audio protocol [5, 4] and a 
simple token passing protocol, applied RTCAT, and measured the number and length of 
the generated tests, the number of reached (convex) equivalence classes and symbolic 
states, and the space and time needed to generate the tests and output them to a file. 
The ERA models can be found in [22] . The platform used in the experiment consists 
of a Sun Ultra-250 workstation running Solaris 5.7. The machine is equiped with 1 GB 
RAM and 2x400 MHz CPU’s. No extra compiler optimizations was done to the code. 
The results are tabulated in Table 10. 

The size of the produced test suites is in all combinations quite manageable, and 
constitute test suites that could easily be executed in practice. There is thus a large 
margin allowing for more test points per equivalence class, or longer tests. Moreover, 
coverage of even larger specifications can also be obtained. Since the reached sym- 
bolic states are labeled toBeTested during construction of the reachability graph, the 
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construction order may influence the number and length of tests. Our results show that 
depth first construction generates slightly fewer tests than breadth first, but also con- 
siderably longer test suites. This suggests that breadth first should be used when the 
most economic covering test suite is desired, and that depth first should be used when a 
covering test suite is desired that also checks longer sequences of interactions. 





Breadth First 


Depth First 


Specification 


CofM 


Phil (R) 


Phil (S) 


Token 


CofM 


Phil (R) 


Phil (S) 


Token 


Equivalence Classes 


14 


60 


47 


42 


14 


60 


47 


42 


Symbolic States 


17 


71 


97 


15427 


17 


85 


98 


7283 


Time (s) 


1 


1 


2 


541 


1 


2 


2 


158 


Memory (MB) 


5 


5 


5 


40 


5 


5 


5 


24 


C-Number of Tests 


16 


97 


68 


71 


16 


86 


67 


60 


C-Total Length 


45 


527 


393 


574 


45 


1619 


487 


5290 


I-Number of Tests 


22 


118 


85 


84 


22 


118 


85 


84 


I-Total Length 


58 


614 


467 


665 


58 


2103 


587 


6321 



Table 10. Experimental results from generating tests from the coffee machine, the 
Philips audio protocol receiver component, sender component with collision detec- 
tion, and 7-node token passing protocol. l=individually generated tests (algorithm 8), 
C=composed tests. 



The tabulated figures on the space and time consumption is the maximum observed; 
generally test composition takes slightly longer and uses a little extra space. For the 
first three specifications, the space and time consumption is quite low, and indicates 
that fairly large specifications can be handled. However, we have also encountered a 
problem with our current implementation which occurs for some specifications (such 
as the token passing protocol), where our application of the symbolic reachability tech- 
niques becomes a bottleneck. When the specification uses a large set of active clocks 
(one per node to measure the token holding time for that node plus one auxiliary in 
the example), we experience that a large number of symbolic states is constructed in 
order to terminate the forward reachability analysis. Consequently, an extreme amount 
of memory is used to guarantee complete coverage. It is important to note that the size 
of the produced test suite is still quite reasonable. We believe that this problem can be 
alleviated by applying the reachability analysis on the original specification automaton 
rather than as presently done on the equivalence class graph. This should result in larger 
and fewer symbolic states. Further, more sophisticated clock reduction algorithms could 
be applied [14], e.g., in the token passing protocol only one node may hold the token at 
a time, and thus one clock suffices. 

6 Conclusions and Future Work 



This paper presented a new technique for generating real-time tests from a restricted, 
but determinizable class of timed automata. The underlying testing theory is Hennessy’s 
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tests lifted to include timed traces. A principal problem is to generate a sufficiently small 
test suite that can be executed in practice while maintaining a high likelihood of detect- 
ing unknown errors and obtaining the desired level of coverage. In our technique, the 
generated tests are selected on the basis of a coarse equivalence class partitioning of the 
state space of the specification. We employ the efficient symbolic techniques developed 
for model checking to synthesize the timed tests, and to guarantee coverage with respect 
to a coverage criterion. The techniques are implemented in a prototype tool. Application 
thereof to a realistic specification shows promising results. The test suite is quite small, 
and is constructed quickly, and with a reasonable memory usage. Our experiences, how- 
ever, also indicate a problem with our application of the symbolic reachability analysis, 
which should be addressed in future implementation work. Compared to previous work 
based on the region graph technique, our approach appear advantageous. 

Much other work remain to be done. In particular we are examining the possibilities 
for generalizing our specification language. It will be important to allow specification 
and effective test of timing uncertainty, i.e., that an event must be produced or accepted 
at some (unspecified) point in an interval. Further, it should be possible to specify envi- 
ronment assumptions and to take these into account during test generation. Finally, our 
techniques should be examined with real applications, and the generated test should be 
executed against real implementations. 
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Abstract. Various attempts have been made to use genetic algorithms 
(GAs) for software testing, a problem that consumes a large amount 
of time and effort in software development. We demonstrate the use of 
GAs in automating testing of complex data structures and methods for 
manipulating them, which to our knowledge has not been successfully 
displayed before on non-trivial software structures. We evaluate the ef- 
fectiveness of our GA-based test suite generation technique by applying 
it to test the design and implementation of the Intentional Naming Sys- 
tem (INS), a new scheme for resource discovery and service location in a 
dynamic networked environment. Our analysis using GAs reveals serious 
problems with both the design of INS and its inventors’ implementation. 



1 Introduction 

Genetic algorithms [7] are a family of computational models inspired by biologi- 
cal evolution. These algorithms encode a potential solution to a specific problem 
on a simple chromosome-like data structure and apply recombination operators 
to these structures so as to preserve critical information. Genetic algorithms are 
often viewed as function optimizers, although the range of problems to which 
they have been applied is quite broad [19]. 

There have been various attempts ([3], [6], [10], [14], [16], [18], [20]) to use 
genetic algorithms in software testing, a problem that is very labor intensive 
and expensive [2]. In this paper we explore the use of genetic algorithms in 
automating testing of complex data structures used in naming infrastructures 
for dynamic networks of computers and devices. 

Naming is a fundamental issue in distributed systems that is growing in 
importance as the number of directly accessible systems and resources grows to 
the point that it is difficult to discover the (names of) objects of interest. The 
difference between a true confederation of computing services and a collection of 
networked centralized computing systems lies in the systemN ability to provide 
a uniform and location independent way of accessing and naming resources. 

Testing architectures that provide service location and resource discovery 
using location independent names in a worldwide internetwork is clearly a chal- 
lenging task. 
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1.1 Software Testing 

Studies indicate that software testing consumes more than fifty percent of the 
cost of software development [2]. This percentage is even higher for critical soft- 
ware, such as that used for avionics systems. As software becomes more pervasive 
and is used more often to perform critical tasks, it will be required to be of higher 
quality. Unless we can find more efficient ways to perform effective testing, the 
percentage of development costs devoted to testing will increase significantly. 

Generation of test data to satisfy testing requirements is a particularly labor- 
intensive component of the testing process. For a given testing requirement, test 
data generation techniques try to identify a program input that will satisfy the 
selected testing criteria. If the process of test data generation is automated, 
significant reductions in the cost of software development could be achieved. 

Various test data generation techniques have been automated. Goal-oriented 
test data generators select inputs to execute the selected goal irrespective of the 
path taken (e.g. [13]). Random test data generators use some distribution to 
select random inputs (e.g. [15]). Structural or path-oriented test data generators 
make use of the program^ control flow graph to select a particular path, and use 
a technique such as symbolic evaluation to generate test data for the selected 
path (e.g. [5], [17]). Intelligent test data generators typically guide the search for 
new test data using complex analyses of the code (e.g. [4], [14]). 

In this paper we present a technique for automating test data generation 
using a genetic algorithm, that aims at testing structural properties of the data 
structures involved in the program and their associated methods. The genetic 
algorithm conducts its search by constructing new test data [next generation) 
from previously generated test data [current generation) that are evaluated as 
good candidates. The algorithm evaluates the candidate test data based on the 
code coverage achieved, the control points of interest executed or avoided, and 
the required properties satisfied. 

Our GA-based testing technique has four essential components. The first part 
is to identify methods to test and global properties of interest concerning these 
methods. The second part is to determine a genetic encoding such that each test 
datum encodes a sequence of operations of interest and their parameters. 

The third component involves computing the fitness of test data and has 
three subparts. First, we trivially modify the methods identified in the first part 
to reward test data that access them by incrementing their score per line of code 
executed in a method of interest. Second, we identify control points in code that 
are of particular interest and either add a bonus score or a penalty for executing 
that point. The rationale for doing so is explained in Section 3.2. Third, we award 
bonus points to test data that possess the properties identified in part one. This 
bonus or penalty is considerably greater than the score given per statement of 
execution in the first part. 

The final component of our framework is to apply standard genetic operators 
of evaluation, crossover, and mutation on the genetic representation of test data 
in the current generation and move onto the next generation. 
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An advantage of our approach is that since the genetic representation of each 
test datum represents a sequence of operations of interest, it is straightforward 
to test the behavior of a program when such operations are interleaved. 

Another benefit is due to the idea of using barriers (i.e. awarding a large 
negative penalty for executing certain control points) as this induces new test 
data to evolve and identify bugs that have not already been discovered, without 
having to fix the ones previously found. 

Our framework applied to generate automated test data for the Intentional 
Naming System (INS) [1] (Section 2), a new scheme for resource discovery and 
service location in a dynamic networked environment, reveals serious fiaws in 
both the design and implementation [21] of INS. These fiaws, to our knowledge, 
were not previously known to the INS inventors. In particular, we establish that 
in the INS naming architecture, addition of a new service can cause a situation 
where the system makes valid services inaccessible to clients seeking them. 

In the next section, the background on INS is given. Then, the genetic al- 
gorithm for test data generation is described. Following that, the results from 
testing the INS implementation are presented. Next, the technique presented is 
compared to related work. Finally, conclusions and future work are given. 



2 INS Background 

One particular service discovery solution in dynamic networked environments is 
the Intentional Naming System (INS) [1], which allows services to describe and 
refer to each other using names which are intentional. These names describe a 
set of properties that the services should have rather than specify a low-level 
network location. The idea is to allow applications to refer to what service they 
want rather than where in the network topology the service resides. It also allows 
applications to communicate seamlessly with end-nodes, despite changes in the 
mapping from name to end-node addresses during the session. 

INS comprises applications and intentional name resolvers (INRs). Applica- 
tions may be clients or services with services providing the functionality or data 
required by clients. Like IP routers or conventional name servers, INRs route re- 
quests from clients to appropriate locations, using a database that maps service 
descriptions to their physical network locations. 

An INR provides a few fundamental operations. When a service wants to 
advertise itself - because, for example, it has come online after being down, or 
because its functionality has been extended - it calls the Add-Name operation 
to register the service against an advertisement describing it. Applications make 
queries by calling the resolvers Lookup-Name operation. 

Intentional names are implemented in INS using name- specifiers that repre- 
sent both queries and advertisements. A name-specifier (Figure 1) is an arrange- 
ment of alternating levels of attributes and values in a tree structure. In Figure 
1, hollow circles identify attributes and filled circles identify values. Attributes 
represent categories in which an object can be classified. Each attribute has a 
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name-specifier name-tree 




R1 



Lookup-Name (name-tree, name-specifier) = {RO} 



Fig. 1. Example of a Lookup-Name operation 



corresponding valne that is the objects classification within that category. A 
wild-card may be nsed in place of a value to show that any value is acceptable. 

An attribute together with its value form an av-pair; each av-pair has a set 
of child av-pairs that specialize it to further describe the object. Orthogonal av- 
pairs specializing the same av-pair are siblings in the tree. The name-specifier in 
Figure 1 describes an object in building NE-43 that provides a camera service. 

An INR stores its information in a database called a name-tree (Figure 1). 
A name-tree resembles a super-positioning of several name-specifiers, and stores 
the correspondence between name-specifiers and name-records, which include the 
IP addresses of services advertising the name. 

A name-tree also has two fundamental building blocks, an attribute-node 
and a value-node. A value-node can have several attribute-nodes as children. 
Similarly, an attribute-node can have several value-nodes as children, each rep- 
resenting a distinct value the name-tree knows. 

A value-node that corresponds to a leaf av-pair of an advertised name- 
specifier also contains a pointer to the relevant name-record. In Figure 1 this 
is represented by broken arrows, and the name-tree shown stores two objects, 
one (i.e. RO) that provides a camera service in NE-43 and the other one (i.e. Rl) 
that provides a printer service in the same building. 

The name-records for a name-specifier are retrieved from a name-tree using 
the Lookup-Name operation. An algorithm for this operation is given in pseudo- 
code in the published description of INS [1], and is replicated in Appendix A. 
When it is invoked on the name-specifier and name-tree in Figure 1, RO is re- 
turned since the value of attribute Service’ sought by the client (i.e. camera) 
does not match that provided by Rl (i.e. printer). 

An implementation of the naming architecture of INS appears in [21]. About 
1400 lines of Java code implement Lookup-Name and Add-Name and the relevant 
data structures and methods, and another 900 lines constitute the testing code 
used by INS inventors. 
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3 The Stochastic Approach 

A genetic algorithm is an optimization henristic that emnlates natnral processes 
like selection and mntation in natnral evolntion. It evolves solntions to problems 
that have large solntion spaces and are not amenable to traditional search or 
optimization techniqnes. Genetic algorithms have been applied to a broad range 
of learning and optimization problems [19] since their inception by Holland [7]. 

Typically a genetic algorithm starts with a random popnlation of solntions 
(chromosomes). Throngh a recombination process and mutation operators it 
evolves the popnlation toward an optimal solntion. Achieving an optimal so- 
lntion is not gnaranteed and the task is to design the process to maximize the 
likelihood of generating snch a solntion. The first step is the evalnation of fitness 
of solntions in the cnrrent popnlation to act as parents in the next generation. 
Solntions are considered more fit than others if they are closer to an optimal. 

Upon evalnation, several solntions are selected and solntions with a higher 
value of fitness are more likely to get selected. After selection, the parents are 
recombined and mutated to generate offsprings. The new population is thus 
formed and the cycle is repeated. 

The processes of evaluation, selection, recombination and mutation are usu- 
ally performed many times in a genetic algorithm. Selection, recombination, 
and mutation are generic operations in any genetic algorithm and have been 
throughly investigated in literature. On the other hand, evaluation is problem 
specific and relates directly to the structure of the solutions. Therefore, in a 
genetic algorithm a major issue is to design the structure of solutions and the 
method of evaluation. Among other issues are size of the population, portion of 
population taking part in recombination, and mutation rate. 

Our GA-based testing technique has four essential components: 

• identification of methods to test and their global properties of interest; 

• framing a genetic encoding such that each chromosome represents a seguence 
of operations of interest and their parameters; 

• formulation of the fitness function, which has three subparts: 

•• (trivial) modification of the methods identified in part one to reward 
chromosomes that access them by incrementing the score of a chromo- 
some per line of code it executes in such a method; 

•• identification of control points of interest in code and addition of either 
a bonus score or a penalty for causing execution of that point; 

•• awarding bonus points to chromosomes that possess the properties iden- 
tified in part one. The bonus or penalty is considerably greater than the 
score given per statement of execution in the first part; 

• application of standard evaluation, crossover, and mutation operators on the 
chromosomes in the current generation and move onto the next generation. 

An optimal chromosome would therefore encode a test suite that invokes a se- 
quence of operations of interest, and satisfies the desired properties with regards 
to that sequence, executes or avoids executing control points of interest as re- 
quired, and gives maximal code coverage. The following sections explain these 
notions in detail. 
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Fig. 2. Representation of a test snite 



3.1 Genetic Encoding 

As a very first step of onr testing techniqne we identify methods that we want 
to test. The most interesting operations in the naming architectnre of INS are 
Lookup-Name and Add-Name. These methods in tnrn determine the genetic en- 
coding and fitness fnnction that evaluates the chromosomes. 

The most obvious way to test the behavior of operations is to have a chro- 
mosome represent which operation to perform along with its parameters. So if 
we were to test name resolution of INS, a chromosome could encode a Lookup- 
Name or Add-Name operation along with the name-tree and the name-specifier 
on which to perform that operation. 

A problem with this representation is that it is not immediate how to observe 
the combined effect of a sequence of dependent operations. For example, in INS, 
if for a given name-tree we want to determine the effect of repeated additions 
on the resolution of a fixed name-specifier in the resulting name-trees, it would 
not be feasible to do so. 

An alternative representation is to have a chromosome denote a sequence 
of operations with some parameter having an implicit representation. In the 
case of INS, a chromosome could then encode successive Lookup-Name and Add- 
Name operations with only one parameter. It represents that sequence of oper- 
ations starting from an empty name-tree. So for example it could encode^ 

add Afi, lookup M2 ^ add M^^ add Ma^ lookup M^ 

to represent a sequence of operations that starts with a new name-tree To, adds 
Ml to To to result in name-tree Ti, resolves M2 with respect to Ti, and so on. 
This way we could evaluate how a chromosome performs based on the results 
generated by each of the Add-Name or Lookup-Name operation that it induces. 

We use a slight modification of this representation in our framework for 
testing INS. In particular, a chromosome denotes five name-specifiers, last four of 
which are to be inserted in an empty name-tree one by one, and the first one is to 
be resolved following each insertion. So, for example, the chromosome in Figure 
2 would start execution by creating a new name-tree, then Mi would be added 
to the name-tree, followed by resolution of Ao, addition of M2, resolution of Ao, 
and so on. Notice that this structure is particularly well suited for investigating 
the effect that addition has on resolution, and there is no need to have an explicit 
encoding for a name-tree. 

^ For convenience, we write A/^ for Name-Speciher i 
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Fig. 3. Genetic representation of a name-specifier 



In order to represent a name-specifier we need to determine a snitable nnmber 
of bits that captnre the behavior of methods nnder scrntiny. Dne to the recnrsive 
natnre of Lookup-Name and Add-Name it is necessary to have a representation 
that indnces some recnrsive calls. We nse two way branching at the top level 
and allow one of the children to branch two way, while the other child may only 
have one fnrther child. 

The name-specifier in the top right corner of Fignre 5 depicts a full name- 
specifier that can be encoded like this. Moreover, we select the attributes and 
values from a pool of 8 attributes, {aO,. . .,a7}, and 8 values, {vO,. . .,v7}. This 
gives us sufficient freedom to perform our testing using diverse test cases. 

We use 34 bits (Figure 3) to represent a name-specifier and thus, a chromo- 
some can be represented using 170 bits as is shown in Figure 2. In Figure 3, 
TF represents 1 or 2 way branching at top level, T2’ determines whether the 
first child has a child, T3n3’ determines whether the second child has 0,1, or 2 
children. The sequence ’aaa’ contains an attribute and Vvv’ contains a value. 

Figure 4 illustrates a sample chromosome and presents the results of the 
test sequence it would induce. Name-specifiers 1-4 are inserted one by one into 
an empty name-tree to get the name-tree shown in the bottom right corner. 
Name-specifier 0 is resolved after each addition and the resulting name-records 
are displayed in the bottom line. Notice that during this execution as more ad- 
vertisements are added to the name-tree, resolution returns more name-records^. 



3.2 Fitness Function 

To evaluate the performance of a chromosome, we define our fitness function 
to have two components. The first component, only computes the number 
of statements that are executed while simulating the sequence of operations 
encoded in a chromosome. In the case of INS, we add a statement of the form 
score++; with every statement of the Lookup-Name procedure. This step can 
easily be automated. Notice, that based solely on this fitness function we can 
start our experimentation and the fittest chromosomes would try to maximize 
code coverage of this method. 

However, simply achieving maximal code coverage is not our goal. The second 
component of our fitness function, is determined by the kind of tests we would 

like to perform. It uses two simple ideas. 

Firstly, in order to induce chromosomes to explore certain aspects of the sys- 
tem being tested, we award bonus reward to chromosomes that do so. This could 

^ The result of Lookup-Name is treated empty if it is {} or {*} 
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Lookup 1 = Lookup2 = 

Lookups = {R2}, Lookup4 = {R2,R3}, 
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Fig. 5. Revealing a flaw in the 
INS implementation 

Looknpi denotes the resnlt of ith 
call to Lookup-Name . 



Fig. 4. Visnalisation of a sam- 
ple chromosome to test INS. 



involve control points in the code that are more snsceptible to lead to rnn-time 
errors, or global properties of the test seqnence represented by a chromosome like 
for example, rewarding chromosomes that resnlt in differing resnlts of Lookup- 
Name operation. The chromosome presented in Fignre 4 was in fact prodnced 
by rewarding 10 extra points per pairwise different Lookup-Name results that it 
produced. 

Secondly, we introduce barriers in the form of penalty points for chromosomes 
that execute parts of the code that we have already determined no longer to be 
interesting from the point of view of further testing. This concept of using barrier 
functions turns out to be a very powerful idea as we demonstrate in the next 
section. 

The fact that we can use it to evolve chromosomes that do not visit certain 
parts of the code means that once we discover a bug, we do not have to fix it 
immediately in order to proceed with our testing. Instead we can just introduce 
a negative score at that control point and the testing system would evolve to test 
other parts of the code that can be executed independently of this buggy point. 
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These two components together form our fitness function 

T — Ts ^ Td 

and we demonstrate its utility in the next section where we test INS. 

4 Analysis 

In this section we use our testing technique to analyze the Java implementation 
of INS that is given in [21]. There are various properties that we can test of 
a naming scheme. For example, whether the name resolution mechanism ever 
returns objects that have functionality conflicting what an application seeks, or 
whether it returns all objects that conform to a request. 

We test a fundamental property that we believe is essential for the correctness 
of a naming scheme. In particular, we see if addition behaves monotonically in 
INS, i.e. performing the Lookup-Name operation after an addition results in at 
least the name-records that result if the same name-specifier is resolved before 
that addition. 

In order to test this property we modify our fitness function to reward chro- 
mosomes that are able to violate it. This is achieved by comparing the elements 
of the sequence IZi , . . . , IZ 4 of results produced by the Lookup-Name operations 
performed after each Add-Name operation that the chromosome induces^. If this 
sequence is found to have elements IZi and IZj for i > j such that IZi C IZj , we 
reward such a chromosome with an additional score of 100 points for each such 
pair. 

After incorporating this change we execute our system and let it evolve to 
see if any of the chromosomes can actually result in such behavior. The evo- 
lution stabilizes in about 110 generations and the highest scoring chromosome 
represents the name-specifiers illustrated in Figure 5. As we see from the results 
of Lookup-Name operations, the last Lookup-Name operation produces no name- 
records, despite the fact that before the final addition was performed, a valid 
service, namely Rl, was returned. 

Careful examination of the INS code reveals that the inventors of INS use 
a (boolean) fiag to indicate whether a set contains all elements of a domain, 
instead of actually inserting those elements into it at the time of its creation 
(line 1 of pseudo-code in the Appendix). This fiag representation later causes 
problems when set unions are performed, and results in loss of information. In 
Figure 5 this happens after the final addition when an attribute corresponding 
to a4 in the name-specifier to lookup is searched in the name-tree. This is an 
extremely subtle fiaw, and our system quickly evolves to detect it and generates 
our first counterexample. 

We next explore the question if this violation of a fundamental property is 
solely due to the use of this fiag based representation. Our task now is to induce 

^ We write IZi for Lookup!, to represent a set of name-records resulting by an execution 
of Lookup- Name after the addition 
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Fig. 6. Another flaw in the INS 
implementation 



Fig. 7. Non-monotonicity of the 
Add-Name operation in INS 



onr system to evolve away from nsing any nnion operation that leads to this 
effect. To achieve this, we introduce a negative reward for any chromosome that 
causes an execution of the union operation when exactly one of the sets involved 
has its flag set and the other one is non-empty. We subtract 200 points from the 
score of such a chromosome. 

Having set the parameters this way, we restart the evolution of our testing 
system and observe the behavior of the highest ranked chromosome. Around gen- 
eration number 98 the system stabilizes and the best chromosome in that state 
is presented in Figure 6. We only illustrate the first four name-specifiers since 
the desired effect is observed then. Notice that the third addition contradicts 
the monotonicity property. 

An analysis of the behavior of Lookup-Name on this test suite reveals that 
the INS implementation does not handle a value mismatch correctly. When at- 
tributes match at a certain level but no corresponding value matches, the imple- 
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mentation behaves in a fashion that once again leads to this erratic behavior. In 
Figure 6 this happens after the third addition, when a value corresponding to v2 
in name-specifierO is searched in the name-tree among the children value-nodes 
of the attribute-node aO. It should be noted here that this behavior is not due 
to the bug discovered above. 

Having identified another cause of failure of the fundamental property of 
monotonicity in INS we once again use the idea of introducing a penalty function 
to discourage chromosomes from causing execution paths that lead to already 
discovered bugs. We now add an additional penalty of 100 points at the control 
point in Lookup-Name that handles a value mismatch. 

This time our system evolves to a stable state in about 65 generations. Fig- 
ure 7 displays part of the highest scoring chromosome in that generation. The 
second addition triggers off the required effect. It is interesting to note that this 
behavior is independent of the bugs discovered above with the INS inventors 
implementation. 

In fact, this problem is due to a flaw in the semantics of INS. INS inventors 
defined missing attributes to act as wild-cards [1] and the Lookup-Name algo- 
rithm tries to incorporate that feature. However, this leads to INS displaying this 
highly undesirable behavior and there is no consistent notion of what it means 
for a name-record to conform to a name-specifier. 

Figure 8 shows the performance of our testing system in producing each 
of the three counterexamples discussed in this section. We plot the score of 
the best test suite in a generation (on the vertical axis) against the genera- 
tion number. “BuggyStar.data” shows the results of experiment that resulted in 
the chromosome in Figure 5, “ValueMismatch.data” for that in Figure 6, and 
“NonMonotonicAdd.data” for the chromosome in Figure 7. 
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All the tests took less than 1 minute on an Intel Celeron 400 MHz processor. 
Throughout the experiments our genetic algorithm used a population size of 200 
chromosomes with the fittest 100 parenting offsprings, mutation rate of 0.05 and 
a single-point crossover. It took fewer than 120 generations for the system to 
stabilize in each testing scenario. 

5 Related Work 

Other researchers have investigated the use of genetic algorithms for automating 
test data generation, but most work has focused on achieving maximal code or 
branch coverage. 

McGraw et al. [14] explore their use in dynamic test data generation where 
the problem of test data generation is reduced to one of minimizing a function. 
They provide an implementation of Korebs function minimization approach to 
test data generation using a genetic algorithm. A stated goal of their approach 
is to cover all branches in a program. 

Pargas et al. [16] present a goal-oriented technique for automatic test data 
generation using a genetic algorithm that is guided by the control dependencies 
in the program. They aim at achieving statement and branch coverage. 

The GA-based framework of Roper et al. [18] tests C programs by instru- 
menting them with probes that provide feedback on the coverage achieved. 

Jones et al. [10] have used genetic algorithms to generate test sets auto- 
matically that satisfy the requirements for test data set adequacy of structural 
testing. A recent paper by Bueno et al. [3] builds on their work and presents a 
tool for the automation of test data generation and infeasible path identification. 
Their focus is also to perform structural software testing. 

Grob [6] argues that genetic algorithms make Dynamic Timing Analysis of 
systems feasible, and give accurate predictions of a systemA run-time behavior 
through their analysis of the interactions of the program^ input parameters. 

Schultz et al. [20] apply GA-based machine learning techniques to the general 
problem of evaluating an intelligent controller for an autonomous vehicle. Their 
approach subjects a vehicle controller to an adaptively chosen set of fault sce- 
narios within a vehicle simulator, and searches for combination of faults, using 
genetic algorithms, that produce noteworthy performance by the controller. 

Our approach contrasts with these in several ways. First, we aim to test 
complex data structures and methods for manipulating them, and our primary 
concern is not to get the maximal code or branch coverage. Second, we use 
the idea of barrier functions (negative reward) which allows us to identify new 
bugs without having to fix the ones that we have already discovered. Third, we 
are able to test properties concerning interleaving of operations in a real world 
system. 

Recently [12], we created an object model of the naming infrastructure of INS 
in Alloy [8] and analyzed it with the Alloy Analyzer [9] to disprove a published 
claim made by the inventors of INS about the equivalence of wild-cards and 
missing attributes. Using that model, we also discovered that the published 
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Lookup-Name algorithm [1] failed to handle certain bonndary conditions and 
gave erratic resnlts. Private commnnication with the INS inventors revealed that 
those bonndary cases were fixed in their Java implementation given in [21]. 

The most important advance over onr work in [12] is that the analysis pre- 
sented here discovers bngs in the proposed fixes of the inventors, and, moreover, 
identifies a major fiaw in the design of the naming semantics of INS and its name 
resolntion algorithm. In [11] we extend onr original Alloy model to reveal the 
fiaws discnssed here nsing the Alloy Analyzer. 

Using a model checker to verify properties abont a strnctnre of the com- 
plexity of INS reqnires a thorongh nnderstanding of the algorithms involved 
and changing them necessitates remodeling. A model checker, however, typically 
gnarantees to find a bng if one exists in (small) finite scope, provided the model 
is sonnd. Also, a model can be constrncted withont an actnal implementation. 

Onr analysis nsing genetic algorithms only needs elementary knowledge of 
the implementation details of INS. Moreover, since onr GA-based framework 
manipnlates the implementation code directly, the same framework can be used 
to incorporate any future changes to the code being tested. 

6 Conclusions 

We have presented and successfully demonstrated an automated test data gen- 
eration framework based on genetic algorithms that can be adapted to test com- 
plicated software structures and methods for manipulating them. Our approach 
is especially well suited to evaluating other naming schemes in which the corre- 
spondence between names and objects is non-trivial. 

Care, however, needs to be taken in order to adjust the parameters, espe- 
cially the fitness function, so as to induce the chromosomes to evolve to test the 
desired features. We decided to set the bonus or penalty points two orders of 
magnitude more than the reward for executing a statement of the code, after 
some experimentation. 

Designing a suitable genetic representation of the test data required some 
care. A cursory examination of the description of the data structures involved 
would lead to an inefficient encoding. The use of a representation that never 
encodes a name-tree directly makes it more versatile. 

We believe that the use of genetic algorithms in testing has great benefits, as 
they not only generate quality test data quickly but also can identify structural 
fiaws that are particularly hard to detect otherwise. We view them as comple- 
mentary to other standard testing tools. A static analysis tool, for example, 
might be used to assist in computing a suitable fitness function. 

It is our goal to identify a set of properties that encapsulates the correctness 
of a general naming scheme. This would be a first step in creating a framework 
for testing an arbitrary naming scheme using our GA-based testing technique. 

We would also like to explore the possibility of using these ideas in program 
slicing and detection of infeasible program paths. The concept of using barriers 
while evaluating fitness seems especially promising. 
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A Pseudo-code for Lookup-Name 

The following pseudo-code description of Lookup-Name is taken from [1]. 



Lookup-Name (T,n) 

S <- the set of all possible name-records 
for each av-pair p := (na, nv) in n 
Ta <- the child of T such that 

Ta’s attribute = na’s attribute 
if Ta = null 
continue 

if nv = * // wild card matching 

S ’ <- empty-set 

for each Tv which is a child of Ta 

S’ <- S’ union (all of the name-records in the 
subtree rooted at Tv) 

S <- S intersection S’ 
else // normal matching 

Tv <- the child of Ta such that 
Tv’s value = nv’s value 
if Tv is a leaf node or p is a leaf node 

S <- S intersection (the name-records of Tv) 
else 

S <- S intersection Lookup-Name (Tv, p) 
return S union (the name-records of T) 



Fig. 9. Eookup-Name algorithm 
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Abstract. Web applications are becoming increasingly complex and im- 
portant for companies. Their design, development, analysis and testing 
need therefore to be approached by means of support tools and method- 
ologies. In this paper we consider the problems related to building tools 
for the analysis and testing of Web applications and we try to provide 
some indications on possible solutions, based upon our experience in the 
development of the tools Re Web and Test Web. 

The dehnition of a proper reference model will be discussed, as well as 
the impact of dynamic pages during Web site downloading and subse- 
quent model construction. Visualization techniques addressing the large 
amount of extracted data will be presented, while infeasibility problems 
will be considered with reference to the testing phase. 



1 Introduction 

In the last years, Web applications have become important assets for several com- 
panies, being a convenient and inexpensive v^ay to provide prodnct information, 
e-commerce and services on-line. Since a software bng in a Web application conld 
interrnpt an entire business and cost millions of dollars, there is a strong demand 
for methodologies, tools and models that can improve the web site quality and 
reliability [7,8]. For example, tools can support developers understanding the 
abstract structure of a Web application by means of views and analyses, ensur- 
ing that the requirement specifications are satisfied by the application, and they 
can help in the testing phase. Developing a tool that extracts a model of a Web 
application, implements some static analyses and supports the developers in the 
testing phase is not easy. Main problems are related to: modeling the abstract 
structure of Web applications, adapting known analysis and testing techniques 
to the characteristics of Web based systems, and visualizing large graphs [6,10]. 
Only few works have insofar considered the problems related to Web site static 
analysis, maintenance, testing and to building the associated tools. One of the 
first systematic studies on Web maintenance is [12], where the authors recog- 
nize the similarity between software systems and web based systems and the 
importance of the maintenance phase. They have built a tool called SiteSeer 
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that downloads web sites and computes some metrics on them. The paper [9] 
describes SPHINX, a Java toolkit and interactive development environment for 
Web spiders. SPHINX consists of two parts: the Spider workbench, a customiz- 
able spider that supports a graphical user interface and visualizes the web site 
recovered as a graph, and the WebSPHINX class library, that provides support 
for writing Web spiders in Java. The CAPBAK/Web tool, explained in [8], is a 
web testing tool that supports functional testing and regression testing. In [7] 
an approach to data flow testing of Web applications is presented. 

In this paper we consider the problems related to building tools for the anal- 
ysis and testing of Web applications and we try to provide some indications on 
possible solutions, based upon our experience in the development of the tools 
Re Web and Test Web. Re Web downloads and analyzes the pages of a Web 
application with a twofold purpose: building a model of the application and sup- 
plying some views and analyses to the developer. Test Web, a structural testing 
tool, generates and executes a set of test cases for a Web application whose 
model was computed by Re Web. 

The remainder of this paper is organized as follows: the next section describes 
a generic Web application infrastructure. Section 3 introduces the general archi- 
tecture of our tools and presents the adopted analysis model for Web applica- 
tions, Sections 4 and 5 explain problems encountered and solutions adopted in 
the development of the tools Re Web and Test Web. Finally, Section 6 concludes 
the paper. 



2 Web Applications 

A typical generic Web Application infrastructure is shown in Figure 1 (a similar 
schema is proposed in [13]). 




Fig. 1. Web application infrastructure. 
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The browser sends the requests via HTTP to the server for an interactive 
view of Web pages. Web pages can be static or dynamic. While the content 
of a static page is fixed and stored in a repository, the content of a dynamic 
page is computed at run-time by the application server and may depend on the 
information provided by the user through input fields (a similar distinction is 
proposed in [4] and [5]). The programs that generate dynamic pages at run-time, 
as for example CGI scripts and servlets, run on the application server and can 
use information stored in databases and other resources. The Web server and the 
application server can be located on the same machine or on different machines. 

Similarly to [5], we classify Web applications^ according to a taxonomy, or- 
dered by growing complexity, which is characterized by dynamism, page decom- 
position and data flow. 

— Level 0: static pages without frames. 

— Level 1: static pages with frames. 

— Level 2: dynamic pages without data transfer from client. 

— Level 3: dynamic pages with data transfer from client. 

The difficulties and problems in the construction of a tool, that supports 
developers in the phases of analysis and testing of Web applications, grow with 
increasing levels in the taxonomy. Applications at levels 2 and 3 typically exploit 
information stored inside a database to build the content of the dynamic pages. 
All four levels are in the scope of the proposed techniques. 

3 Tool Architecture 




Fig. 2. Roles of Re web and Test Web. 



The two tools Re Web and Test Web have been developed to support anal- 
ysis and testing of Web applications. Their relative roles are schematized in 

^ Although some authors distinguish between Web sites and applications, using the 
latter term only in presence of dynamic pages (our levels 2 and 3), we will use them 
interchangeably in the following if the distinction is not important. 
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Figure 2. Re Web downloads and analyzes the pages of a Web application with 
the purpose of building a model of it and producing some analyses and views. 
Test Web generates and executes a set of test cases for a Web application whose 
model was computed by Re Web. The whole process is semi-automatic, and the 
interventions of the user are indicated within diamonds in Figure 2. Explanations 
on the manual interventions will be given in the following sections. 

Both tools perform their operations on an abstraction of the Web applica- 
tions, indicated in Figure 2 as UML model. UML, the Unified Modeling Lan- 
guage [3], was exploited to express such a model. Let us consider the key require- 
ments on the model. We are interested in a model that can be directly abstracted 
from the implementation. Some important characteristics that it should have can 
be summarized as follows: 

— The focus should be on the navigational features of the site; 

— It should be complete i.e. the most important entities as, for example links, frames, 
forms and dynamic pages must be explicitly represented in the model; 

— It should be possible to provide (partial) automatic support for its extraction; 

— It should be possible to apply to it some static analyses and testing techniques 
derived from those used with traditional software systems. 

— It should be possible to derive some views from it that represent the Web site in 
an intuitive mode; 




Fig. 3. Meta model of a generic Web application structure. 



Figure 3 shows our meta model used to describe the elements in the model 
of a Web application. It satisfies all key requirements given above. The central 
entity in a Web site is the WebPage. A Web page contains the information to 
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be displayed to the user, and the navigation links toward other pages. It also 
includes organization and interaction facilities (e.g., frames and forms). 

The two subclasses of WebPage model the static and dynamic pages. When 
the content of a dynamic page depends on the value of a set of input variables, 
the attribute use of class DynamicPage contains them. 

A frame is a rectangular area in the current page where navigation can take 
place independently. Moreover the different frames into which a page is decom- 
posed can interact with each other, since a link in a page loaded into a frame can 
force the loading of another page into a different frame. This can be achieved by 
adding a target to the hyperlink. Organization into frames is represented by the 
association split into, whose target is a set of Frame entities. Frame subdivision 
may be recursive (auto-association split into within class Frame), and each frame 
has a unary association with the Web page initially loaded into the frame (absent 
in case of recursive subdivision into frames) . When a link in a Web page forces 
the loading of another page into a different frame, the target frame becomes the 
data member of the (optional) association class LoadPagelntoFrame. 

In HTML user input can be gathered by exploiting forms. A Web page can 
include any number of forms (association include). Each form is characterized 
by the input variables that are provided by the user through it (data member 
input). Values collected by forms are submitted to the Web server via the special 
link submit, whose target is always a dynamic page. Since links, frames and forms 
are part of the content of a Web page, and for dynamic pages the content may 
depend on the input variables, even the organization of a page is, in general, 
not fixed and depends on the input. This is the reason for the association class 
ConditionalFdge, which optionally adds a boolean condition, function of the 
input variables, representing the existence condition of the association (which 
can in turn be a link, an include or a split into). The target, page, form or frame, 
is referenced by the source dynamic page only when the input values satisfy the 
condition in the ConditionalFdge. 



4 Re Web 



The Re Web tool consists of three modules: a Spider, an Analyzer and a Viewer. 
The Spider downloads all pages of a target web site, starting from a given URL 
and providing the input required by dynamic pages, and it builds a model of 
the downloaded site. Each page found within the site host is downloaded and 
marked with the date of downloading. The HTML documents outside the web 
site host are not considered. The user has to specify the set of inputs for each 
page that contains Forms. The Analyzer uses the UML model of the web site and 
the downloaded pages to perform several analyses, presented in the following, 
some of which are exploited during static verification. The Viewer provides a 
Graphical User Interface (GUI) to display the Web application model as well as 
the output of the static analyses. The graphical interface supports a rich set of 
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navigation and query facilities. Web Spider and Analyzer are written in Java, 
while the Viewer is based on Dotty^. 



4.1 Spider 



SPIDER(target_url) 

1 UML_Model ^ 0 

2 Error _urls ^ 0 

3 Pages_already .visited ^ 0 

4 Urls Jound ^ 0 

5 S ^ {target.url} 

6 while (S / 0) 

7 choosen.url ^ chooseElement(S) 

8 S ^ S \ {choosen.url} 

9 if not (choosen.url G Pages.already .visited) then 

10 if (choosen.url is OK) then 

11 Pages.already .visited ^ Pages.already .visited U {choosen.url} 

12 if (choosen.url is a HTML page) then 

13 Download (choosen.url) 

14 UrlsTound ^ scanPage(choosen.url) 

15 S ^ S U Urls.found 

16 AddElementsToModel(UML.Model, choosen.url, UrlsTound) 

17 endif 

18 else 

19 Error .urls ^ Error.urls U {choosen.url} 

20 endif 

21 endif 

22 endwhile 



Fig. 4. Pseudo-code of the Spider. 



Web pages are not actually written using a single language. They can be 
rather regarded as multilanguage documents, where code fragments in languages 
different from HTML can be loaded (e.g. Applets) or interpreted (e.g. Javascript). 
Libraries for the construction of Spider programs are available for programming 
languages such as Perl, C/C++, or Java. An example is the WebSPHINX class 
library [9]. We decided to implement our Web Spider just exploiting the Java 
language and its standard library, starting from scratch, in order to have to- 
tal control on the multilingual aspects of the downloaded pages. We developed a 
parser which recognizes both HTML and Javascript code fragments, and extracts 
the needed information (links, forms, frames, etc.) from them. 

Figure 4 shows the pseudo-code of our Spider. The procedure SPIDER takes 
a given URL in input and builds the associated UML model. The body of the 
command while (contained in lines 6-22) is executed until there are elements in 
the set S. The function choose Element (line 7) chooses an element in the set S 
while the condition chosen.url is OK (line 10) is true if chosen_url is well-formed 
and the corresponding page exists in the Web site. The condition chosen.url is 

^ Dotty is a customizable graph Editor developed at AT AT Bell Laboratories by 
Eleftherios Koutsohos and Stephen C. North. 
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an HTML page (line 12) is true if the content type of the document connected 
to chosen_url is HTML. In line 13 the procedure Download is called to store the 
retrieved page in the file-system. The function scanPage (line 14) scans the page 
and returns the set of URLs found within it and contained in the site host. The 
procedure AddElementsToModel (line 16) adds nodes and edges to the model 
in accordance with our meta model. If the set S is implemented as a stack, the 
algorithm visits the Web site in depth-first way, while using a queue produces a 
breadth-first visit. 

Problems encountered in the construction of the Spider are due to irregu- 
larities and ambiguities present in HTML code, also noted in [12], and to the 
current state of the Web technology, offering a large spectrum of alternatives to 
implement a web site. Our solution was to improve the robustness of the parser, 
so that it could accept a superset of HTML including the main irregularities 
commonly recognized and properly interpreted by available browsers. 

Dynamic pages pose additional problem to the activity of the Spider. Since 
the content of these pages is decided at run time, it may in general depend 
on the input previously provided by the user. In particular, the structure of 
a dynamic page may change when it is encountered in a different interaction. 
Since the model of a Web site encompasses all possibilities, the Spider has to 
recover all variants of a dynamic page, and has to merge them into a single 
representative object. This can be achieved by specifying the input values to be 
provided before downloading the dynamic page of interest. Moreover, the same 
dynamic page has to be downloaded several times, with different inputs, when 
the different conditions generate a different page structure. All sequences of input 
values to be provided before each page download are specified in a file which is 
read by the Spider. All dynamic pages specified in the file are downloaded after 
providing the Web server with the given inputs. Finally, all versions of the same 
page are merged. Although the number of inputs to be provided may explode 
combinatorially, out experience suggests that in practice few alternatives are 
sufficient to cover all variants. 

An additional input that may affect the content displayed in a dynamic page 
is the cookie that the browser provides to the Web server. A cookie is a user 
identifier that is stored by the browser in the local file system and is provided 
to the Web server to allow user identification each time a new connection with a 
given server is established. After recognizing the user, the Web server can provide 
a customized version of the dynamic pages in the site. Since their structure may 
depend on the cookie, the Spider needs the ability to send a cookie to the Web 
server, in order to obtain also the pages that are generated when the user is 
identified. 

4.2 Analyzer 

The UML model of a Web site can be interpreted as a graph by associating 
objects with nodes and associations with edges. Some simple analyses may de- 
termine the presence of unreachable pages, i.e., pages that are available at 
the server site but cannot be reached along any path starting from the initial 
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FLOW_ANALYSIS(graph G = (N, E)) 

1 for each {n G N) 

2 initialize IN^ and OUT^ 

3 endfor 

4 change ■<— true 

5 while (change) 

6 change ■<— false 

7 for each (n G N) 

8 INn ^ ©pepred(n) OUTp 

9 OLDOUTn ^ OUTn 

10 OUTn ^ GENn - KILLn) 

11 if OUTn ^ OLDOUTn then 

12 change ■<— true 

13 endif 

14 endfor 

15 end while 

Fig. 5. Pseudo-code of the flow analysis algorithm. 



page. They are obtained as the difference between the pages available in the 
Web server file system and those downloaded by the Spider. Ghost pages are 
associated with pending links, which reference a non existing page. 

More advanced analyses [11] can be derived from the general framework of 
flow analysis [1], described by the algorithm in Fignre 5. The algorithm propa- 
gates flow information inside a graph nntil the fix-point is reached. The kind of 
flow information to be propagated depends on the pnrpose of the analysis being 
performed. Some examples are given below. Moreover, the continence operator 
exploited at line 8 to collect ontgoing information from predecessor nodes is also 
dependent on the analysis, and is typically either the intersection or the nnion. 
After initializing the inpnt and ontpnt sets of each node (IN^ and OUTn, lines 
1-3) with the initial flow information, propagation is achieved inside a fix-point 
loop (line 5) by snbtracting the destroyed information [KILLn set) and adding 
the generated information [GENn set) to the incoming information (line 10) for 
each node n in the graph. 

An example of analysis which specializes the algorithm in Fignre 5 is the 
compntation of the reaching frames, which determines the set of frames in 
which each page can appear. When a page is loaded into a frame as its initial 
page or is reachable throngh an edge decorated with a LoadPageIntoFrame asso- 
ciation class instance, it generates the name of the frame as flow information. By 
propagating snch information along the site graph nntil the fix-point is reached, 
the reaching frames of each page are determined. The ontcome of the reaching 
frames analysis is nsefnl to nnderstand the assignment of pages to frames. The 
presence of nndesirable reaching frames is thns made clear. Examples are the 
possibility to load a page at the top level, while it was designed to always be 
loaded into a given frame, or the possibility to load a page into a frame where 
it shonld not be. 

Flow analyses can be employed in a more traditional fashion to determine the 
data dependences. Nodes of kind Form generate a definition of each variable 
in the input set. Snch definitions are propagated along the edges of the Web site 
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graph. If a definition of a variable reaches a node where the same variable is used 
{use attribute of a dynamic page), there is a data dependence between defining 
node and user node. Data dependences are useful to represent the information 
flows in the application. They may reveal the presence of undesirable possibilities, 
such as using a variable not yet defined or using an incorrect definition of a 
variable. Data dependences are also extremely important for dynamic validation, 
when data flow testing techniques are adopted. 

When the pages of a site are traversed, it is impossible to reach a document 
without traversing a set of other pages, called its dominators. Sites in which 
traversing a given page is considered mandatory, e.g., because it contains im- 
portant information, will have it in the dominator set of every node. Dominator 
analysis, also derived from the algorithm in Figure 5, automates the check. 

The evolution of web sites [10,12] is another interesting object of investiga- 
tion. Such an analysis requires the ability to compare successive versions of its 
pages and to graphically display the differences. Given two versions of a web 
site, downloaded at different dates, their comparison aims at determining which 
pages were added, modified, deleted or left unchanged. It can be combined with 
the static analyses described above, since their re-computation over time allows 
controlling the evolution of the application quality. 

4.3 Viewer 

The graph mew of a Web application is a graph, whose nodes correspond to the 
objects in the model and whose edges correspond to the associations between 
objects. Labeled edges are used for the links having a LoadPagelntoFrame or 
ConditionalEdge relation specifier. In the graph mew, to intuitively suggest de- 
composition into frames, we adopt the convention of joining horizontally the 
nodes of type frame contained in the same page, and collapsing the edges of 
type “split into” into a single edge. An example of decomposition into frames 
is shown in Figure 6. Page madmaxpub/ index .html (the main page) is divided 
into two frames with identifiers a and b, and frame a is used as a menu to force 
the loading of pages into the other frame. 

The graph mew of a web application can be enriched with information about 
its history [10], by coloring the nodes and associating different colors to different 
time points (see Figure 6). In particular, a scale of colors ranging from the blue, 
going through the green and reaching the red can be employed to represent nodes 
added/modified in the far past, in the medium past or more recently. 

The Viewer is based on Dotty, and uses the algorithm explained in [6] for 
drawing directed graphs. The aesthetic principles followed by the algorithm are: 
to expose hierarchical structure (if any) in the graph, to avoid edge crossings 
(if possible) and sharp bends, to keep edges short, to favor symmetry and bal- 
ance. The layout algorithm of the graph mew of a web site is very important to 
understand its structure, especially when the site is very complex. 

Another problem connected with the visualization of a web site is the fact 
that also small sites (e.g. with 100 pages) can have an entangled structure diffi- 
cult to understand. A way to improve the Viewer display is to use techniques to 
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Fig. 6. Colored graph view of the site www.ubicum.it/madmaxpub at date 3-2- 
2000 . 



abstract, to simplify, to extract a portion of, or to see only a part of the graph 
mew. Another possibility can be to add other views of a web site as for example 
the birdeye and overview diagrams or to display only the depth-first tree with- 
ont retnrn edges (solntion adopted in [9]). The views and facilities we propose 
follow. The system mew represents the organization of pages into directories; 
the data flow mew displays the read/write accesses of pages to variables, respec- 
tively throngh incoming/ontgoing edges linking pages to variables; the history 
representation with percentage bars describes, in compact way, the percentages 
of nodes with the same color. Among the provided facilities, the viewer snpports 
zoom, search, deletion of incoming or ontgoing edges, and focns. The facilities 
for focnsing on and searching a node are nsefnl when the visnalized graphs are 
very large. By exploiting the focnsing facility it is possible to display only a 
limited neighborhood of a selected node. Another possibility to access the graph 
mew, not yet implemented, is the identification and extraction of a portion of 
a web site by means of pattern matching techniqnes. Recnrrent patterns are 
expected to be nsed in the design of web sites (for example tree, hierarchy, full 
connectivity, indexed- seguence ) . 

5 TestWeb 

Web sites can involve a complex interaction among Web browser, operating 
systems, plng-in applications, commnnicating protocols, Web servers, databases, 
server programs (for example CGI programs) and firewalls. Snch complexity 
makes the test of Web Sites a great challenge [8]. Ideally all components and 
fnnctionality of a Web site on both client and server sides shonld be completely 
tested. However, this is rarely possible in modern Web site projects becanse of 



Building a Tool for the Analysis and Testing of Web Applications 383 



the extreme time pressure under which Web systems are developed. Available 
testing techniques [8] differ on the features of the Web site we want to test. For 
example it is possible to execute link testing, HTML validation, performance 
testing, and security testing. We are interested in dynamic validation using our 
UML model as a base for this type of testing. In general, dynamic validation 
methods aim at exercising the system by supplying a vector of input data {test 
case) and comparing the expected outputs with the actual ones after execution. 
In particular, we considered white box testing of Web Applications: the internal 
structure of a Web application is accessed to measure the coverage that a given 
test suite (collection of test cases) reaches, with respect to a given test criterion 
(stating the features to be tested). Some white box testing criteria, derived from 
those available for traditional software [2], are: Page testing. Hyperlink testing. 
Definition-use testing. All-uses testing. All-paths testing. A test case for a Web 
application is a triple: URL, input (a sequence of variable- value assignments 
separated by the character ’&’), type of parameter passing (GET or POST). 
Execution consists of requesting the Web server for the URL in the triple with 
the associated input and storing the output pages. Satisfaction of any of the white 
box testing criteria involves selecting a set of paths in the Web site graph and 
providing input values. Since path selection is independent (conditional edges 
excluded) from input values, it can be automated. 




Fig. 7. Architecture of the tool Test Web. 



Test Web (see Figure 7) contains a test case generation engine (Test gen- 
erator), able to generate test cases from the UML model of a Web application. 
The user has to add some information to the model produced by Re Web to 
complete it for testing purposes and furthermore the user has to choose a test 
criterion. The user specifies the page type when the distinction between static 
and dynamic pages cannot be obtained automatically (e.g., dynamic pages with 
no input). The user also provides the set of used variables, use, for each dy- 
namic page whose content depends on some input value. Finally, the user has 
to attach conditions to the edges whose existence depends on the input values. 
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Additional manual interventions, related to state unrolling, will be described in 
the following on an example. Generated test cases are sequences of URLs which, 
once executed, grant the coverage of the selected criterion. Input values in each 
URL sequence are left empty by the Test generator, and the user has to fill in 
them, possibly exploiting the techniques traditionally used in black box testing 
(boundary values, etc.). Test Web’s Test executor can now provide the URL 
request sequence of each test case to the Web server, attaching proper inputs 
to each form. The output pages produced by the server are stored for further 
examination. After execution, the test engineer intervenes to assess the pass/fail 
result of each test case. A second, numeric output of test case execution is the 
level of coverage reached by the current test suite. 

Regression highly benefits from the automation in test case execution, 

since each test case can be re-executed unattended on a new version of the Web 
application, and its output pages can be automatically compared with those 
obtained from a run of the previous version. 

5.1 Test Generator 

Given the graph representation of a Web application, a reduced graph can be 
computed for the purposes of white box testing: each static page without forms 
is removed from the graph by a Cross- Term step described in [2] (see Figure 8). 




Fig. 8. Step of the Cross-Term algorithm at a node selected for removal. 



In the resulting graph, a fictitious entry node is added, connected with all 
nodes with no predecessor, and a fictitious node is directly reachable from all 
output nodes, i.e., dynamic nodes with non empty use attribute. In fact, the end 
of a computation is reached, in a Web application, when some result is displayed 
to the user, but no intrinsic notion of termination for a navigation session exists. 

Differently from the flow-graph of a structured program, the graph mew of a 
Web application can contain horrible-loops [2], i.e., there may be nodes jumping 
into or out of a loop and/or there may be more than one iterating node for 
the same loop. In presence of horrible-loops the usual strategies used to cover 
nested-loops and concatenated-loops do not work. We have chosen a general so- 
lution: a test case generation technique based on the computation of the path 
expression [2] of the reduced Web site graph. A path expression is an algebraic 
representation of all paths in a graph. Variables in a path expression are edge 
labels. They can be combined through operators -h and *, associated respec- 
tively with selection and loop. Brackets can be used to group subexpressions. 
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REDUCTION(graph) 

1 Combine all serial links by multiplying their expressions 

2 Combine all parallel links by adding their path expressions 

3 Remove all self-loops by replacing them this a link of the form X* 

4 while (number of nodes in the graph > 2) 

5 n ^ choose a node of the graph different from initial or final node 

6 Apply Cross-Term elimination to n 

7 Combine any remaining serial links as in step 1 

8 Combine all parallel links as in step 2 

9 Remove all self-loops as in step 3 

10 endwhile 



Fig. 9. Reduction algorithm. 



Computation of the path expression for a site can be performed by means of the 
Reduction algorithm described in [2] and depicted in Figure 9. 

The lines 1-3 of the Reduction algorithm initialize the process and put the 
graph in normal form. The body of the command while is executed until the 
number of nodes in the graph is greater than 2. Line 5 assigns a node of the 
graph different from the initial or final node to variable n, while line 6 executes 
a Cross- Term step on node n. This step eliminates the node n and transforms 
the graph according to the diagram shown in Figure 8. Lines 7-10 combine all 
serial and parallel links and remove self-loops. At the end of the execution, the 
path-expression of the input graph is obtained. 



PATH_GENERATION(path_expression) 

1 while criterion not satisfied 

2 for each alternative from inner to outer nesting 

3 choose one never considered before, if any 

4 or randomly choose one 

5 endfor 

6 if computed path increases coverage then 

7 add it to the resulting paths 

8 endif 

9 endwhile 

Fig. 10. The heuristic technique adopted to obtain the paths satisfying a crite- 
rion. 



Since the path expression directly represents all paths in the graph, it can 
be employed to generate sequences of nodes (test cases) which satisfy any of 
the coverage criteria. Determining the minimum number of paths, from a path 
expression, satisfying a given criterion is in general a hard task. However, heuris- 
tics can be defined to compute an approximation of the minimum. The heuristic 
technique adopted for this work is based on the scheme of Figure 10 (the alter- 
native at line 2 for a loop is whether to re-iterate or not). Definition-use and 
all-uses testing can be achieved by considering, for each data dependence, the 
definition as entry node and the use as exit of the subgraph to be tested. Criteria 
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such as definition-use and all-paths testing, for which the coverage of possibly 
infinite paths should be achieved, could require that only independent paths be 
considered or that loops be /^-limited. 




Fig. 11. Model of a portion of www.m-w. com, including two conditional edges. 



Figure 11 shows the portion of the Web site www.m-w. com which provides on 
line access to the Merriam- Webster English dictionary and thesaurus. A word 
can be entered in the initial page (either dictionary.htm or thesaurus.htm). 
A dynamic page dictionary is then composed in response to the input word, 
stored in variable va. The content of the resulting page depends on the number 
of entries found in the dictionary. If there are more than one entry, a selection 
list is displayed to the user, together with an explanation of the main entry. The 
user can choose among the alternatives - the selection is stored in variable jump 
- and move to a page, still named dictionary, with an explanation of such an 
entry. The model of the site represents the conditional existence of the list of 
alternatives with a ConditionalEdge object associated with the edge labelled b. 
The site offers the possibility to enter a new word from both the dynamic pages 
dictionary and thesaurus, and allows switching from dictionary to thesaurus 
and vice versa from the initial and result pages. 

In order to determine a set of paths to be exercised during white box testing, 
the path expression is computed. Let us consider the portion of the site devoted 
to the extraction of entries from the dictionary (symmetric considerations can 
be made for the thesaurus). The path expression associated with the labelled 
edges of Eigure 11 is a{bc de)* . Some of the paths generated from it can be 
traversed only if proper inputs are provided, while some other paths are infeasible 
for every input. Eor example, the path abc can be traversed only if the input 
word has more than one entry in the dictionary. This condition is quite easy 
to achieve and requires only a careful selection of input data. A path whose 
infeasibility does not depend on the input is abcbc. In fact, if one of the entries is 
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selected from the list displayed to the user, the next dynamic page dictionary 
that is obtained will not include the list of alternative entries any longer. This 
condition is represented as 'not (jump selected) ' in Figure 11: if a selection 
was performed by the user, edge b does not exist. As a consequence the path 
expression cannot be easily exploited for path generation. 




Fig. 12. Page dictionary was unrolled into dictionary! and diet ionary 2, 

The problem highlighted above derives from the possibility to use a same 
dynamic page for different purposes. Actually, some Web sites consist of just 
one dynamic page which displays different information according to an inter- 
nally recorded state of the interaction. In other words, while for static sites the 
Web page is coincident with the state of the interaction, this is not necessar- 
ily true with dynamic sites. A possible solution to this problem is to perform 
an operation of state unrolling on the dynamic pages that are used to display 
different contents under different conditions. In the example of Figure 11, page 
dictionary is used for two purposes: to propose a list of alternative dictio- 
nary entries and to provide the final result of the search, once a single entry is 
identified. Such two purposes may be represented explicitly by the two pages 
dictionary! and dictionary2 into which the initial page is unrolled (see Fig- 
ure 12). The conditions in the ConditionalEdge objects are now simplified and 
verify only the number of dictionary entries. The path expression of such site 
portion becomes {ac-\-bf-\-bhgc){dc-\-e^f-\-e,hgcY and all the paths that can be 
generated from it are feasible, provided that an input word with the appropriate 
number of dictionary entries is selected. 

6 Conclusions and Future Work 

We proposed some analysis and testing techniques working on Web applications. 
The starting point for their definition is a model of Web sites, designed to include 
all characteristics that are relevant from an architectural point of view. Page 
downloading and model construction were achieved by providing input values 
for the forms in the site. Moreover, the site model was enriched with information 
about conditional edges and variable uses, exploited during testing. 
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Facilities for the display of the resulting model are provided by the analysis 
tool Re Web, while test generation and execution is automated by Test Web. 
Our experience with Re Web and Test Web suggests that the choice of a good 
model is fundamental for both analysis and testing. We showed some views 
and facilities of Re Web on a real example (www.ubicum.it). Path testing in 
presence of an internal representation of the interaction state can be simplified 
by means of a state unrolling operation, which was also described with reference 
to a real world example (www.m-w.com). 

Our future work will be devoted to extending the set of analyses available 
(to include, for example, pattern matching), adding abstraction techniques to 
support a high level view of the site, (partially) automating input selection during 
testing in presence of conditional edges and providing better support to state 
unrolling. 
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Abstract. Testing is a critical component of the software development 
process and is required to ensure the reliability, robustness and usability 
of software. Tools that systematically aid in the testing process are cru- 
cial to the development of reliable software. This paper describes a code- 
based testing and analysis tool for object-oriented software. TATOO pro- 
vides a systematic approach to testing tailored towards object behavior, 
and particularly for class integration testing. The underlying program 
analysis subsystem exploits combined points-to and escape analysis de- 
veloped for compiler optimization to address the software testing issues. 



1 Introduction 

Testing is a critical component of the software development process and is re- 
quired to ensure the reliability, robustness and usability of software. Unfortu- 
nately, testing, and in particular ad hoc testing, is labor and resource intensive, 
accounting for 50%-60% of the total cost of software development [13]. There- 
fore, it is imperative that testing techniques be developed that provide as much 
automation and ease of use as possible. In particular, tools that systematically 
aid in the testing process are crucial to the development of reliable software. 

Object-oriented features enable the programmer to design and develop soft- 
ware that is reusable and modular. Encapsulation, inheritance, and polymor- 
phism are extremely useful to the programmer, but create difficulties for the 
tester. Encapsulation allows the programmer to create classes with state and 
functionality. All instantiated classes possess different state at different times 
through a program’s execution, requiring many different test cases in order to 
adequately test the states of different objects. Complex class interactions are 
introduced through inheritance and class composition. These interactions also 
need to be tested, requiring test cases to exercise the complexities of these class 
relationships. Dynamic binding caused by polymorphism creates additional com- 
plexities when testing object-oriented software. A single polymorphic call site 
represents potential calls to one of a number of different methods with the same 
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name. Each possible receiver type must be executed in order to adequately test 
a program. 

In this paper, we present a Testing and Analysis Tool for Object-Oriented 
programs, 77\TOO, which embodies our novel code-based testing method focused 
on object manipulations [16]. Our tool establishes an environment to systemati- 
cally test programs by providing: (1) automatically generated test tuples based 
on object manipulations, (2) test coverage information through code instrumen- 
tation and test coverage identification, (3) feedback about external influences 
that may affect the correctness of the testing runs, and (4) visualization of the 
underlying program representation that shows the interactions of objects in a 
program. 

TATOO is composed of two subsystems: program analysis and testing. The 
program analysis subsystem’s main objective is to produce the program rep- 
resentation needed for the testing subsystem. The testing subsystem generates 
the test tuples needed for generating test cases as well as provides test cover- 
age information. In addition, the testing subsystem controls the user interface 
environment. 

TATOO is a prototype tool implemented in Java, consisting of several com- 
ponents: 

— The extended FLEX compiler infrastructure [11], which translates the source 
program into a program representation useful for testing. 

— A test tuple generator that generates paths based on object manipulations. 

— A Java graphical user interface that displays the source code and information 
corresponding to the component under test. 

— The daVinci toolkit [8], which displays the program representation in a high 
quality manner, and facilitates communication with the Java GUI, allowing 
interaction between the graphical program representation and the source 
code and testing information. 

— A code instrumentation tool. 

— A trace analyzer, which provides coverage information about an executed 
program based on a given set of test tuples. 

The major contribution of this work is a systematic framework for integration 
testing. The framework is based on a testing technique that focuses on object 
manipulations, and is capable of analyzing incomplete programs. Unlike previous 
work, the technique addresses the common situation of instance variables being 
objects and not primitive types. 

The remainder of this paper is organized as follows. Section 2 provides back- 
ground information on testing and object manipulations. An overview of TATOO 
is given in Section 3. Sections 4 and 5 describe the details of the TATOO subsys- 
tems. In Section 6, we evaluate the overhead attributed to the use of the program 
representation. Finally, we conclude with possible extensions to the tool. 
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2 Underlying Testing and Analysis Technique 

There are two main testing philosophies, namely, black-box and white-box testing. 
Black-box testing [2] does not use any knowledge of the internals of a program. 
The program is a black-box in which information about the source code is un- 
known; we only know what is provided from a program specification. Black-box 
tests are designed to uncover errors, but the focus of such testing is on verifying 
that a specified function operates according to the specification. Black-box test- 
ing has also been referred to as functional testing or specification-based testing. 

White-box^ structural^ or code-based testing techniques are based on knowl- 
edge of the code. White-box testing is not an alternative to black-box testing; in 
fact, they complement each other and both should be performed. Test cases are 
derived through examining the program code and using well-defined data flow 
or control flow information about the program. Control flow-based techniques 
are motivated by the intuition that covering different control-flow paths would 
exercise a large proportion of program behaviors. For example, branch testing [7] 
is a control flow based testing method, which is based on exercising all the true 
and false outcomes of every branch statement. The general idea behind data 
flow testing [14,10], is to generate test data based on the pattern of data used 
throughout a program. 

Data flow testing is based on the premise that testing paths that read and 
write values stored into the same memory locations tests the behavior of a pro- 
gram in terms of its manipulation of data. More specifically, data flow testing is 
based on def-use pairs in a program, where a def of a variable is an assignment 
of a value to the variable via a read or assignment operation, and a use of a 
variable is a reference to the variable, either in a predicate or a computation. 
A def-use pair for variable v is an ordered pair (d,u) where d is a statement in 
which V is defined and u is di statement that is reachable by some path from d, 
and u uses r’ or a memory location bound to v. Data flow testing uses def-use 
pairs in order to generate paths through the definition and use statements in the 
code. Then test data is generated based on those paths. The idea is that for each 
definition in the program, we want to exercise all of the uses of the definition. 

Systematic testing techniques are categorized into different levels of testing. 
First, unit testing of object-oriented programs focuses on validating individual 
classes. As classes are combined or integrated together, integration testing is 
performed to validate that the classes function appropriately when combined 
together. Our research has focused on extending data flow testing to the object- 
oriented domain with special concern for testing instance variables that are ob- 
jects and code-based integration testing of object-oriented components. 



2.1 Object Manipulation-Based Testing 

Our approach to code-based testing of object-oriented software seeks to provide 
coverage in terms of the elemental read and write actions, which is similar to data 
flow testing. We call our approach the OMEN approach, because it is based on 
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covering basic Object Manipulations in addition to using T/scape iMormation to 
provide helpful feedback to the tester in an interactive testing tool environment.^ 
Object-oriented programming focuses on the data to be manipulated rather 
than the procedures that do the manipulating. An object-oriented program 
achieves its goals by creating objects of specific classes. The state of an ob- 
ject is encapsulated as a copy of all of the fields of data that are defined in the 
corresponding class definition. Actions are performed on an object by invoking 
methods defined in the class definition, often called sending a message to the 
object. A method invocation can modify and/or read the data stored in the 
particular object. 



Table 1. Basic object manipulations. 



Object-related Statements 


Object Manipulations 


copy ri = r2 


read of reference r2 
write to reference ri 


load ri = r2.f 


read of reference r2 
read of field r2.f 
write to reference ri 


store ri.f = r2 


read of reference r2 
read of reference ri 
write to field ri.f 


global load r = cl.f 


read of class variable f 
write to reference r 


global store cl.f = r 


read of reference r 
write class variable cl.f 


return r 


read of reference r 


object creation 
r = new Object(....) 


create a new object 
write to reference r 
MOD and USE 


method invocation 
r = ro.methodname(ri,..., rn) 


write to reference r 
read of references ro-rn 
MOD and USE of ro’s 
fields 



In order to better understand the possible behaviors of an object-oriented 
program in terms of object manipulations, we identify the most elemental ob- 
ject manipulation as either a read or write action. The actions that a particular 
statement or method performs on an object can be decomposed into a sequence 
of these elemental actions. Table 1 depicts the elemental object manipulations 
performed by each object-related statement. We assume that the program has 
been preprocessed such that all statements that perform object manipulations 

^ In addition, we view the test cases and the results of executing the test cases as 
an omen to predicting the behaviors of the executing program in the production 
environment. 
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have been expressed in the form of these basic statements. Due to aliasing and 
polymorphism, we may have a set of objects potentially referenced by each refer- 
ence, but for these descriptions, we use the singular form. However, our analysis 
addresses the potential for a set of objects being referenced. 

We extrapolate the concept of data flow testing to the testing of elemental 
object manipulations by defining a (write, read) association of a given object’s 
state, extending this association to include object creation points. From Table 1, 
we can see that the statement that reads an object field is the load statement, 
while the store statement writes to an object field. To ensure that we do not miss 
any viable (write, read) pairs, we assume that a given load/store statement may 
read/write the field of any object which the reference is potentially referencing 
at that program point. Because objects are instantiated at run-time through 
executable statements, we extend (write, read) pairs to triples of the form (write, 
read, objeet ereation) to reflect the fact that a test case should cover the creation 
of the object before any writes or reads to that object. 



2.2 Using Escape Analysis Information 

Escape analysis is a relatively new technique used for optimizing object-oriented 
codes, particularly Java codes[17,6,3,4]. The analysis is used to determine which 
synchronization operations are unnecessary and could be eliminated, as well as 
for reducing the number of objects that are unnecessarily allocated on the heap 
when they could be allocated on the stack. 

We use escape analysis for testing, in order to provide useful feedback to the 
tester. For example, when a program is being tested and errors are uncovered, 
the tester or developer needs to find the cause of the error, i.e., debug the code. 
Our key insight is that the escape information will provide useful feedback to the 
tester about possible problem areas, where objects interact with outside code, 
which may be causing inadvertent changes to an object. 

3 TATOO System Architecture 

TATOO provides a tester with an interactive testing environment to systemati- 
cally test software using the OMEN testing technique. The prototype testing tool 
allows the tester to visualize a graphical representation of the program, which 
characterizes how objects interact with other objects. In addition, the testing 
tool provides the tester with both visual and report-based coverage information 
about the program under test. TATOO also provides information about how ob- 
jects interact with unanalyzed portions of code, as well as where objects may 
potentially escape through method calls or return statements. This information 
is useful in determining potentially fault prone sections of code. 

The tool architecture is composed of two main subcomponents, namely the 
program analysis subsystem and the testing subsystem, as shown in figure 1. The 
program analysis subsystem performs the required analysis to obtain the anno- 
tated points-to escape (ape) graph program representation described in section 
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4.1. In addition, a term representation used to view the ape graph, and an anno- 
tation table which maintains the required information necessary for the testing 
subsystem are generated. After program analysis is performed, the testing sub- 
component computes test tuples for testing the program component under test. 
In addition, the testing subcomponent provides a graphical user interface en- 
vironment that supports two primary features: test coverage identification and 
program representation visualization. The test coverage identifier provides an 
environment that lets the user execute the program and then visualize coverage 
information. Alternatively, the primary purpose of the program representation 
visualizer is to allow the user to visualize object interactions. 




Fig. 1. Overall tool architecture. 



4 Program Analysis Subsystem 

Figure 2 depicts the subcomponents of the program analysis subsystem. Java 
bytecode produced from any Java source compiler is the input to the program 
analysis subsystem. The FLEX static analyzer [11] translates the Java byte code 
into an intermediate format based on object manipulations, and then performs 
points-to analysis to construct the Annotated Points-to Pscape (ape) graph [16]. 
Annotations necessary for calculating read-write relationships between fields of 
objects are an important aspect of the ape graph. 

The ape graph object, produced by the FLEX static analyzer, is then parsed 
to construct a graphical term representation needed to visually display the ape 
graph. In addition, a textual representation of the annotation table is generated, 
which consists of the annotations for each edge in the ape graph. The final 
output of the program analysis subsystem is the ape graph object, the ape graph 
graphical representation, and the annotation table. 
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Fig. 2. Program analysis subsystem. 



4.1 Ape Graph Program Representation 

To develop the ape graph program representation, we extended and modified 
the points-to escape graph program representation [17] to exploit its ability to 
mimic object manipulations. The points-to escape graph representation com- 
bines points-to information about objects with information about which object 
creations and references occur within the current analysis region versus outside 
this program region. For our purposes, the current analysis region is the current 
component under test (CUT), where a component is not necessarily a class or 
method, but any grouping of methods. The points-to information characterizes 
how local variables and fields in objects refer to other objects. The escape in- 
formation can be used to determine how objects allocated in one region of the 
program can escape and be accessed by another region of the program. 

In the points-to escape graph, nodes represent objects that the program 
manipulates and edges represent references between objects. Each kind of object 
that can be manipulated by a program is represented by a different set of nodes 
in the points-to escape graph. There are two distinct kinds of nodes, namely, 
inside and outside nodes. An inside node represents an object creation site for 
objects created and reached by references created inside the current analysis 
region of the program. In contrast, an outside node represents objects created 
outside the current analysis region or accessed via references created outside 
the current analysis region. There are several different kinds of outside nodes, 
namely, parameter nodes, load nodes, and return nodes. 

The distinction between inside and outside nodes is important because it 
is used to characterize nodes as either captured or escaped. A captured node 
corresponds to the fact that the object it represents has no interactions with 
unanalyzed regions of the program, and the edges in the graph completely char- 
acterize the points-to information between objects represented by these nodes. 
On the other hand, an escaped node represents the fact that the object escapes 
to unanalyzed portions of the program. An object can escape in several ways. 
A reference to the object was passed as a parameter to the current method, a 
reference to the object was written into a static class variable, a reference was 
passed as a parameter to an invoked method and there is no information about 
the invoked method, or the object is returned as the return value of the current 
method. 

There are also two different kinds of edges. An inside edge represents ref- 
erences created inside the current analysis region. An outside edge represents 
references created outside the current analysis region. 
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We have extended the points-to escape graph by adding annotations to edges 
in the graph. The annotations provide information about where basic object 
manipulations i.e., loads and stores of objects, occur within a program. Using 
the annotations, we are able to compute store- load i.e., (write-read) pairs for 
the objects in the program, which can be used in a manner similar to data flow 
testing. 

For each method in the CUT, we build one ape graph per method. For each 
load/store of the reference represented by a particular edge e in an ape graph, 
we maintain: 

— a sequence of statement numbers, (si, 52 , 5^), where Sn is the unique state- 
ment number of the load/store statement; 51 , 52 , ...5^-1 contains the state- 
ment numbers of the call sites where this edge was merged into the caller’s 
ape graph during interprocedural analysis performed during construction of 
the current method’s ape graph. Statement 5i is the statement number of 
the call site within the current analysis method which eventually leads to 
the load/store statement. 

— a corresponding sequence of statement numbers, {evs\^evs 2 ^ ...evsn), where 
each evsi is the unique number of the earliest statement at which the state- 
ment Si could have an effect on other statements. We call this the earliest 
visible statement for 5^, evSi. The earliest visible statement evSi = 5^ when 
the statement Si is not inside a loop; otherwise evSi = the statement number 
of the header of the outermost loop containing 5^. 



4.2 Example Annotation Construction 

Figure 3 shows an example set of annotations added to a single edge of an ape 
graph. The nodes in this graph represent objects created within the current anal- 
ysis region; therefore, they are both inside nodes. The edge labeled top represents 
a reference from a field named top of the object of type Stack, annotated with 
both a load and store annotation. The annotations indicate that there exist both 
a load and store of the field top. Further, the location where the load and store 
occurs is maintained through the annotations. The annotation, (store 25-13-3), 
represents two calls, one invoked on line 25 of the program. The second call 
invoked on line 13 can lead to a store of an object into the field top at line 3. 
Similarly, the load of the field top occurs at line 7, following a chain of calls from 
lines 27 and 14. The above example does not include evs statement numbers, 
but they would be maintained in the same manner. 

The annotation chains are easily constructed because the ape graphs for 
individual methods of the CUT are built during a reverse topological traversal 
over the call graph. Therefore, a callee’s graph is always constructed before its 
callers’ graphs. When constructing a caller’s graph, the callees’ annotations are 
simply merged into the caller’s graph at the appropriate call site. 

To emphasize the expressiveness of the ape graph, we include a complete ape 
graph for a simple method. Figure 4 illustrates the ape graph for method push. 
The black nodes represent outside nodes and the white nodes represent inside 
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o 



top 



Store: 25-13-3 
load: 27-14-7 




Object of type Stack 



Object of type Data 



Fig. 3. Illustration of ape graph annotation. 



1: public push(Object e) { 

2: if (top == null) 

3: top = new Node(e, null); 

4: else 

5: top = top.insert(e); } 

6: Node (object e, Node n){ 

7: data = e; 

8: next = n; } 




Fig. 4. Example of complete ape graph for a single method. 



nodes. The graph was built by processing each statement in the method push. 
The constructor call to Node on line 3 maps the nodes from the Node graph 
into the ape graph for push, creating the edges (data, 3-7, store) and (next, 3-8, 
store). The annotations 3-7 and 3-8 indicate that a store occurred on lines 7 and 
8, through a call at line 3. A similar mapping occurs at line 5, through the call 
to insert creating the other data and next edges. Due to space limitations, the 
code for the method insert does not appear. 




Fig. 5. Testing subsystem. 
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5 Testing Subsystem 

The testing subsystem, shown in figure 5, takes as input the ape graph object, 
the term representation necessary for graphically viewing the ape graph, and 
the annotation table, all generated from the program analysis subsystem. The 
following subsections describe the components of the testing subsystem, namely 
the test tuple generator and the test coverage identifier. 



5.1 Test Tuple Generator 



Algorithm 1. Compute testing tuples for a component represented by a set of 
call graphs, each with possibly multiple roots. 

Input: set of call graphs and ape graphs for the CUT; 

Output: set of test tuples for the CUT and feedback on potential influences from outside the CUT; 
1: /* Process each method’s ape graph */ 

2: foreach node n in a topological ordering of call graph nodes do 
3: Let m = method represented by node n; 

4: foreach edge e labeled STORE in m’s ape graph do 

5: /* Create tuples from stores in ape graph */ 

6: Identify associated loads, labeling e, occurring after the STORE 

7: /* Using node type and escape information create tuple or report feedback */ 

8: if source node of e is an inside node then 

9: Replace tuple (store, load) by (cSsn 5 store,load); 

10: else /*source node is an outside node*/ 

11: Feedback (object for (store, load) is potentially created outside CUT); 

12: if source node not eseaped and target node is eseaped then 

13: Feedback (value loaded in (cSsn 5 store,load) 

is potentially changed by method outside CUT, but I is indeed referencing object created 
at cSgTT,), 

14: endfor 

15: foreach edge e in ape graph labeled only by LOAD do 

16: if target node is a load node in APE graph then 

17: Feedbackfload at statement cs/ in method m has potentially reaching references from 

outside CUT); 

18: endfor 

19: endfor 



The test tuple construction algorithm, shown in Algorithm 1, computes a 
set of test tuples for the component under test (CUT), based on object manip- 
ulations. Starting at the root of each call graph of the CUT and proceeding in 
topological order, the method for each call graph node is processed once, by 
analyzing the node’s ape graph. This processing order avoids creating duplicate 
tuples potentially identified due to subgraphs of invoked methods also appearing 
in a caller’s ape graph. As a particular ape graph is analyzed, only unmarked 
edges (those not already processed in a caller’s graph) are processed. 

The algorithm processes each edge in a method’s ape graph. For each annota- 
tion on an ape graph edge representing a store, the associated loads potentially 
occurring after the store are identified, and a (store, load) tuple is created. The 
annotations reflect the results of the flow sensitive points-to escape analysis used 
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to build the ape graph. Thus, the evs and cs statement numbers on these anno- 
tations are adequate to identify the reachable loads from a particular store. The 
object creation site associated with the (store, load) tuple is determined by the 
source node of the edge being analyzed. If the source node is an inside node, then 
the source node is the object creation site and the node number of the source 
node is used to complete the tuple for the (store, load) tuple. If the source node 
is an outside node, then the object is not created inside CUT, and feedback is 
given depending on the kind of the source node and whether it is interior or root 
of the call graph. Additionally, feedback is given when the target node of the 
ape graph edge being analyzed is escaped from CUT. 

The algorithm also provides feedback for load nodes when a corresponding 
store is not present in CUT. This is represented by an ape graph edge that 
is labeled only with load annotations and no store annotations. The feedback 
provides the tester with information about the fact that an object creation site 
could have potentially occurred outside CUT, as well as the possibility that the 
load in CUT has potentially reaching references from outside CUT. 

5.2 Test Coverage Identifier 




Fig. 6. Test Coverage Identifier. 



The test coverage identifier provides an environment useful for providing 
test coverage information about the component under test. Figure 6 illustrates 
the subcomponents of the test coverage identifier. There are two user events, 
INSTRUMENT wad RUN, which the event manager understands. An INSTRU- 
MENT event invokes the code instrumenter, which takes the program source 
code and test tuple table as input. The source line numbers corresponding to 
the store-load-object creation site form the entries of the test tuple table, pro- 
viding all the information necessary to instrument the source code. Simple print 
statements are inserted into the source code to produce instrumented code. The 
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second event, RUN^ invokes the component under test, prompting the user, if 
necessary, for input data, running the program on the JVM, and generating a test 
tuple trace file, in addition to the normal output of the program. The test tuple 
trace file is generated by running the instrumented source code, which provides 
information about the test tuples covered during the program execution. 

Currently, we maintain coverage information for one run of the program. 
In the future, we will maintain information for multiple runs of the program, 
therefore providing more coverage information for the test suite of programs 
running on the component under test. The trace analyzer currently provides 
visual coverage information by analyzing the test tuple trace file and highlighting 
the source code and the corresponding test tuples that were covered during the 
program execution. 



5.3 Ape Graph and Test Tuple Visualizer 

The primary function of TATOOs ape graph and test tuple visualizer is for visual- 
izing the ape graph representation of the CUT, which graphically displays object 
interactions. This subsystem is composed of the daVinci toolkit [8], a graphical 
user interface, and an event manager that communicates with daVinci, providing 
interaction between the source code, ape graph, annotations, and test tuples. 




Fig. 7. The TATOO user interface including daVinci. 
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Visualization of the ape graph is achieved through the daVinci graph draw- 
ing tool, which is an X-Window visualization tool for drawing high quality 
directed graphs automatically [8]. The term representation, produced by the 
program analysis subsystem, is the graphical representation used as input to 
daVinci. DaVinci not only displays the graph, but allows interaction between 
the graph and an external program. Our event manager communicates with 
daVinci through the API defined by the daVinci toolkit. 

The graph visualizer interface of TATOO is shown in figure 7. A user can view 
the source code for a class, the annotations and test tuples produced for that 
class, and a graphical representation of the ape graph. In addition, the user is 
able to selectively view the ape graph and the source code, annotations, and test 
tuples corresponding to the selected graph segment. For example, a user can click 
on an edge in the daVinci graph window, and the source code, annotations, and 
test tuples corresponding to the selected edge are highlighted in their respective 
windows. The graph visualizer allows for a fine grain level of detail correspond- 
ing to how objects interact with other objects in a program. Its usefulness stems 
from the ability to statically visualize an object’s fields and the points-to re- 
lationships constructed throughout the program. The escape information could 
also be visualized in a similar fashion, allowing for the user to click on the escape 
information, which would identify the region of code and the section of the graph 
that an object could escape through. The escape information is available to us, 
but this capability has not been included in our current prototype testing tool. 



6 Implementation and Evaluation 



TATOO has been implemented in Java and evaluated with a set of Java programs. 
Table 2 shows some general characteristics of the benchmark programs we have 
used with TATOO, as well as the storage requirements necessary for the ape 
graph. The characteristics include the number of lines of user code, the number 
of JVM instructions, the number of classes analyzed, and the number of methods 
analyzed. We have reported these numbers separately for user and library sizes 
in order to show that a relatively small program may rely heavily on libraries; 
therefore, analysis of the user program depends not only on the user code, but 
on the library code as well. 



Table 2. Program characteristics and storage requirements. 



Name 


Problem Domain 


# of 
lines 


jvm instr 
User Lib 


classes 
User Lib 


methods 
User Lib 


ave size(Kb) 
User Lib 


max 

size(Kb) 


compress 


text compression 


910 


2500 


7070 


17 


90 


50 


301 


6.1 


1.1 


43.0 


db 


database retrieval 


1026 


2516 


11648 


9 


100 


240 


306 


14.5 


0.9 


98.6 


mpeg 


audio decompr 


3600 


12019 


7188 


49 


92 


58 


383 


5.9 


1.0 


201 


jlex 


scanner generator 


7500 


o 

o 

o 


7250 


19 


72 


106 


to 


22.6 


1.0 


379 


jess 


expert system 


9734 


15200 


13005 


108 


105 


468 


436 


13.9 


1.1 


668 
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One concern in developing a code-based testing tool is the overhead of the 
underlying static analysis. We have experimentally evaluated the space overhead 
of our static analysis. The last three columns of table 2 show the average storage 
requirements of the ape graph per method (user and library) , and the maximum 
ape graph storage requirement per benchmark. We computed the storage re- 
quirements by computing the sum of two products. The first product is the total 
number of nodes over all the ape graphs times the size of an ape graph node, and 
the second product is the total number of edges over all the ape graphs times 
the size of an edge. The storage requirement per ape graph is relatively small. 
The compositional nature of the ape graphs avoids the requirement of keeping 
all ape graphs in memory at once. The maximum ape graph represents the size 
needed to maintain the main method of the program. Essentially, the maximum 
ape graph contains the graphs from all of its callees, which were merged into 
itself. 

The test tuple construction algorithm takes one pass over the call graphs rep- 
resenting the CUT. For each node in the call graph, it processes each unmarked 
edge of the ape graph for that method exactly once. The ape graph is easily 
extendible and the computation of additional test tuples can be performed in a 
demand-driven way as clients are added. 

7 Related Work 

Two proposed code-based testing tools based on data flow testing techniques in- 
clude the Coupling Based Coverage Tool (CBCT)[1] and Orso’s testing tool[12]. 
CBCT is a coverage-based testing tool that reports coverage metrics to the user, 
by instrumenting the source code. CBCT is based on a coupling-based testing 
technique, which is a data flow method based on coupling relationships that 
exist among variables across call sites, and is useful for integration testing. To 
our knowledge, CBCT has not been implemented. Orso’s testing technique for 
integration testing is based on a data flow testing technique used for integra- 
tion testing, in particular for polymorphic test coverage[12]. His tool is similar 
to ours, but uses a different program representation, which does not provide 
testing feedback to the user. There is also no mention of how they deal with 
references or instance variables that are objects of different types in Java code. 

Previous work on structural testing of object-oriented software has concen- 
trated on data flow analysis for computing def-use associations for classes [9], 
testing of libraries in the presence of unknown alias relationships between pa- 
rameters and unknown concrete types of parameters, dynamic dispatches, and 
exceptions [5], and developing a set of criteria for testing Java exception handling 
constructs [15]. 

8 Conclusions and Future Work 

We have presented TATOO, a testing and analysis tool for object-oriented soft- 
ware. The primary benefits of this tool are its ability to automatically generate 
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code-based test tuples for testing, to determine test coverage through code in- 
strumentation and test coverage notification, to provide feedback to the tester 
about external object interactions that could affect their result, and to visually 
display object interactions through the daVinci graph drawing tool. 

In the future, we plan on extending this work in several ways. First, we plan 
on designing and implementing a new component to TATOO which automatically 
or semi- automatically generates test cases from the test tuples. Then, we plan 
to perform an experimental evaluation on different variations of the OMEN 
approach. Finally, we plan on designing and implementing a regression testing 
component for TATOO which will indicate portions of code that need to be 
re-tested after modifications have been made to the original code. 
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Abstract. Multi-valued logics support the explicit modeling of uncertainty and 
disagreement by allowing additional truth values in the logic. Such logics can be 
used for verification of dynamic properties of systems where complete, agreed 
upon models of the system are not available. In this paper, we present an im- 
plementation of a symbolic model checker for multi-valued temporal logics. The 
model checker works for any multi-valued logic whose truth values form a quasi- 
boolean lattice. Our models are generalized Kripke structures, where both atomic 
propositions and transitions between states may take any of the truth values of a 
given multi-valued logic. Properties to be model checked are expressed in CTL, 
generalized with a multi-valued semantics. The design of the model checker is 
based on the use of MDDs, a multi-valued extension of Binary Decision Dia- 
grams. We describe MDDs and their use in the model checker. We also give its 
theoretical time complexity and some preliminary empirical performance data. 



1 Introduction 

Multi-valued logics provide an interesting alternative to classical boolean logic for mod- 
eling and reasoning about systems. By allowing additional truth values in the logic, they 
support the explicit modeling of uncertainty and disagreement. For these reasons, they 
have been explored for a variety of applications in databases [12], knowledge represen- 
tation [13], machine learning [17], and circuit design [15]. 

A number of specific multi-valued logics have been proposed and studied. For ex- 
ample, Lukasiewicz [16] first introduced a three- valued logic to allow for propositions 
whose truth values are ‘unknown’, while Belnap [1] proposed a four- valued logic that 
also introduces the value ‘both’ (i.e. “true and false”), to handle inconsistent assertions 
in database systems. Each of these logics can be generalized to allow for different lev- 
els of uncertainty or disagreement. In practice, it is useful to be able to choose different 
multi-valued logics for different modeling tasks. 

The motivations that led to the development of these logics clearly apply to the 
modeling of software behaviour, especially the exploratory modeling used in the early 
stage of requirements engineering and architectural design: 

“ We need to allow for uncertainty - for example, we may not yet know whether 
some behaviours should be possible; 

- We need to allow for disagreement - for example, different stakeholders may dis- 
agree about how the systems should behave; 

- We need to represent relative importance - for example, in the case where some 
behaviours are essential and others may or may not be implemented. 



T. Margaria and W. Yi (Eds.): TACAS 2001, LNCS 2031, pp. 404-419, 2001. 
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For reasoning about dynamic properties of systems, existing modal logics can be ex- 
tended to the multi-valued case. Fitting [10] suggests two different approaches for doing 
this: the first extends the interpretation of atomic formulae in each world to be multi- 
valued; the second also allows multi-valued accessibility relations between worlds. The 
latter approach is more general, and can readily be applied to the temporal logics used 
in automated verification [6] . 

Some automated tools for reasoning with multi-valued logics exist. In particular, 
the work of Hahnle and others [14,19] has led to the development of several theorem- 
pro vers for first-order multi-valued logics. However, as yet the question of model check- 
ing for multi-valued modal logics has not been addressed. 

In this paper we describe our implementation of a multi-valued symbolic CTL 
model checker, Xchek. Xchek is generalized for an entire family of multi-valued log- 
ics, known as the quasi-boolean logics. It takes as its input a description of a particular 
quasi-boolean logic, represented as a lattice of truth values, a state machine model, rep- 
resented as a multi-valued Kripke structure, and a temporal logic property expressed in 
CTL. It returns the truth value that the property has in the initial state(s). 

The paper is structured as follows. Section 2 motivates the work with an example 
of a multi-valued state machine model. Section 3 describes the family of quasi-boolean 
multi-valued logics, and shows how these are specified as lattices of truth values. Sec- 
tion 4 explains our approach, describing our multi-valued extension of Kripke structures 
and our multi-valued extension of CTL. Section 5 presents the design of the model 
checker and analyses its performance. Section 6 presents our conclusions. 

2 Motivation 

To motivate the development of our model checker, and to illustrate its application, we 
present an example state machine model expressed in a multi-valued logic. The model 
captures an early draft of the requirements for a simple coffee dispenser. We distinguish 
behaviours that must be true (are required), behaviours that should be true (are desired, 
but not required), behaviours that should not be true (are undesirable), and behaviours 
that must not be true (are prohibited). We use two types of unknown: Don’t Know for 
things that will be controlled by the system, where we do not yet know what behaviours 
we want; and Don ’t Care for things that are controlled by the environment, where the 
value does not matter. We represent these six possibilities in a 6-valued logic, arranged 
as a lattice in Figure 1(a), using the partial order ‘more true than’. 

Figure 1(b) shows the model. Each variable is assigned a truth value in each state. 
Each transition between states is also labeled with a truth value. The coffee dispenser 
starts in state OFF. In this state, it is irrelevant whether there is a cup in the machine, 
so that variable has the value ‘DC’ (“don’t care”). The specification team have not yet 
decided whether they need a power-saving standby mode. They model their uncertainty 
by including the state IDLE, but label the transitions into it ‘S’, indicating these may 
be desirable. They also use the value ‘DK’ (“Don’t Know”) for the state of the power 
in IDLE, and for the transition from OFF to IDLE. However, the transition from OFF to 
READY is labeled ‘T’, indicating that when the power is switched on, the machine must 
enter the READY state. From there, it must be able to deliver coffee, and it should then 
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don’t know 



a) 




Fig. 1. (a) A lattice of truth values; (b) The coffee dispenser model that uses it. 

be able to deliver foam. The transitions from COFFEE and FOAM to OFF are labeled ‘N’. 
These are undesirable, but we cannot prohibit them because the machine has no direct 
control over the power supply. Note that by convention we omit all T’ transitions. 
Hence there is an ‘F’ self-loop for COFFEE and FOAM, indicating we must not stay in 
either state, and an T’ transition from READY to FOAM, indicating this must not occur. 

We can now write properties that ought to be true of the model, even though it 
contains some uncertainties. For example: 

1 . The machine must always be able to make coffee. 

2. It is desirable that the machine make foam. 

3. Coffee cannot be dispensed if there is no cup. 

4. Once coffee is dispensed, we cannot get coffee again until the cup is changed. 

We formalize these properties in Section 4 and give results of model checking them on 
the coffee dispenser model in Table 1 of Section 5. 

In this example, the use of a 6-valued logic allows us to distinguish two levels of 
priority for requirements, and two different types of unknown. We could choose differ- 
ent multi-valued logics if we wanted to distinguish further levels of priority, or different 
types of ‘unknown’. We are also interested in modeling disagreement, and have devel- 
oped a method for reasoning about whether disagreements between stakeholders’ views 
affect various system properties. In [9] we outline a general framework for combining 
inconsistent state machine models into a single model using multi-valued logics to cap- 
ture levels of (dis)agreement. We eventually plan to use the model checker described 
below as a negotiation tool for constructing and reasoning about such models. 



3 Specifying the Logics 

Our approach to modeling makes use of an entire family of multi-valued logics. Rather 
than giving a complete axiomatization for each logic, we simply give a semantics by 
defining conjunction, disjunction and negation on the truth values of the logic, and 
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restrict ourselves to logics where these operations are well-defined, and satisfy commu- 
tativity, associativity etc. Such properties can be easily guaranteed if we require that the 
truth values of the logic form a lattice. In this section we describe the types of lattices 
used in our logics. 

Definition 1. A lattice is a partial order ( C, Q)for which a unique greatest lower bound 
and least upper bound, denoted aUh and aVAh, respectively, exist for each pair of 
elements (a, h). 

ar\b and a U 6 are referred to as meet and join, respectively. The partial order operation 
a\Zh means “6 is at least as true as a”. The following properties hold for all lattices: 

aVA a — a a\Aa — a (idempotence) 

aVAh — hVA a ar\h — hn a (commutativity) 

a lA {b lA c) = {a lA b) lA c a \1 {b \1 c) = {a \1 b) \1 c (associativity) 

Definition 2. A lattice is complete if it includes a meet and a join for every subset of 
its elements. Every complete lattice has a top (T) and a bottom f ±). 

_L = UC (_L characterization) T = VAC (T characterization) 

For example, in the lattice of Figure 1(a), T is labeled T’ and ± is labeled T’. We 
adopt the convention of labeling T and ± in this way in all our lattices. Also, we only 
use lattices that have a finite number of elements. Every finite lattice is complete. 

Definition 3. A finite lattice (£, □) is quasi-boolean [2] if there exists a unary operator 
-1 defined for it, with the following properties (a,b are elements of C): 

-i(a n 6) = -la U -i6 (De Morgan) -i-ia — a (-i involution) 

-i(a U 6) = -la n -i6 a □ 6 -la □ -i6 (-i antimonotonic) 

Thus, is a quasi-complement of a. 

The family of multi-valued logics we use are exactly those logics whose truth values 
form a quasi-boolean lattice. Meet and join in the lattice of truth values define conjunc- 
tion and disjunction operators, respectively, and we assume that an appropriate negation 
operation is defined with the properties required by Definition 3. The identification of 
a suitable negation operator is greatly simplified by the observation that quasi-boolean 
lattices are symmetric about their horizontal axes: 

Definition 4. A lattice (C, \Z) is horizontally- symmetric if there exists a bijective func- 
tion H : C ^ C such that for every pair a,b E C, 

a □ 6 <4> H{a) □ H{b) (order — embedding) = a (H involution) 

Theorem 1. [6] Horizontal symmetry is a necessary and sufficient condition for a lat- 
tice to be quasi-boolean with -la — H{a)for each element of the lattice. 

The negation of each element is then defined as its image through horizontal symmetry^ . 
For example, in Figure 1(a) we have -<T=F, -iS=N, -iDK=DK, -iDC=DC, etc. Finally, 
we define an operator ^ as follows: 

a ^ 6 = -la U b (definition of ^) 



^ Note that we still have to choose how to negate any elements that fall on the axis of symmetry. 
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4 Multi-valued Model Checking 

CTL model checking on two-valued logics was introduced by Clarke and his colleagues 
in [8]. CTL is a branching-time temporal logic that allows quantification over individ- 
ual paths in a tree of computations exhibited by a model. There are five basic pairs of 
operators: AX and EX (“next”), AF and EF (“eventually” or “in the future”), AG 
and EG (“globally”), AU and EU (“until”), AR and ER (“release”). Models are rep- 
resented as Kripke structures, which are finite- state machines that guarantee that there 
is a transition out of every state. See [7] for a detailed account of CTL model checking. 

In this section we describe our multi-valued extension of Kripke structures, which 
we call XKripke structures, and we give the semantics of multi-valued CTL [6]. 

4.1 Defining the Model 

A state machine M is a XKripke structure if M = (5, Sq , R^ /, A, L), where: 

- L is a quasi-boolean logic represented by a lattice (£, □). 

- A is a (finite) set of atomic propositions, otherwise referred to as variables (e.g. 
power or milk in the example in Figure 1(b)). 

- S is a (finite) set of states. States are not explicitly labeled - each state is uniquely 
identified by its variable/value mapping. Thus, two states cannot have the same 
mapping. However, we sometimes use state labels as a shorthand for the respective 
vector of values, as we did in the coffee dispenser example. 

- So C S is the non-empty set of initial states. 

- Each transition (s,t) in M has a logical value in C. Thus, R: S x S C is a. 
total function assigning a truth value from the logic L to each possible transition 
between states, including self-loops. Note that a XKripke structure is a completely 
connected graph. We also require that each state has at least one non-false transition 
coming out of it. 

- I : S X A ^ C is 3. total function that maps a state s and an atomic proposition 
(variable) a to a truth value £ of the logic. For simplicity we assume that all our 
variables are of the same type, ranging over the values of the logic. For a given 
variable a, we will write I SiS R : S ^ C. For symbolic model checking, we 
compute partitions of the state space w.r.t. a variable a using : C 2^ . A 
partition has the following properties: 

Va G A, G ^ ^ = 0) (disjointness) 

Va G A, Vs G S', G : s^IS^{£) (cover) 

4.2 Multi-valued CTL 

Here we give semantics of CTL operators on a XKripke structure M over a quasi- 
boolean logic L. We will refer to this language as multi-valued CTL, or XCTL. L is 
described by a finite, quasi-boolean lattice {C,\X), and thus the conjunction FI, disjunc- 
tion U and negation -■ operations are available. In extending the CTL operators, we want 
to ensure that the expected CTL properties, given in Figure 2, are still preserved. Note 
that the AU fixpoint is somewhat unusual because it includes an additional conjunct. 
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= EXi^ip) 


(negation of “next”) 


A[EUip] = E[EUip] = if 


(± “until”) 


A[ifUrP] = rpy {if h AXA[ifUrli\ A EXA[ipUrli\) 


(AU fixpoint) 


E[pU^] = V V ((/J A EXE[pUijj\) 


(EU fixpoint) 



Fig. 2. Properties of CTL operators. 



EXA[fUg]. This additional term preserves a “strong until” semantics for states that 
have no outgoing T transitions [4] . 

We first give the semantics of the propositional operators. We extend the domain 
of the interpretation function I to any CTL formula (p. For a model M, we use (s) 
to denote the truth value that formula p takes in state s. We omit M if it is clear from 
context. If s G S' is a state, a G A is a variable, and p and x/j are CTL formulae: 

Pa {s) = /(s, a) P^/y^ (s) = P^ (s) A p^p (s) 

P^ip{s) = ~^P^{s) Pipvi){s) = P^p{s) V P^{s) 

We proceed by defining the EX operator. In standard CTL, EX is defined using 
existential quantification over next states. We extend the notion of quantification for 
multi-valued reasoning by using conjunction and disjunction for universal and existen- 
tial quantification, respectively. This treatment of quantification is standard [1,18]. The 
semantics of the EX operator is 

PEXifiis) = Vtgs(^(«>0 -Pv(O) 

The definitions of All, EU and AX are given using the properties in Figure 2: 

PAX(fi(s) = ^PeX^(p(s) 

V (P(p(s) A PEXE[ipU^](^)) 

J^A[ipU^](s) = V (Pip(s) A PAXA[ipU^](s) A PEXA[ipU^](s)) 

The remaining CTL operators, AE(p), EE(p), AG(p), EG(p), A[pR^], E[pR^] 
are the abbreviations for A[TU p], E[TU p], ->EE{^p), ^AE{^p), ^E[-^pU -iV^], 
^A[-^pU -iV^], respectively. 

The properties of the coffee dispenser in Figure 1(b), given in Section 2, can be 
formalized in XCTL as follows 

1. E'F (water) The expected answer is T. 

2. E'F(milk) The expected answer is S. 

3. AG (water ^ cup) The expected answer is T. 

4. AG(water ^ AXA[-iwater W (-icup A -iwater)]) The expected answer is T. 



5 Symbolic Multi-valued Model Checker 

In this section we describe the implementation of our symbolic multi-valued model 
checker, Xchek. The architecture of Xchek is shown in Figure 3. Xchek takes as input a 
model M with variable and transition values from a lattice G, and a XCTL formula p. It 
outputs a total mapping from C to the set S' of states, indicating in which states p takes 

^ We use the operator while defined as A[x W y] = ^E[-^y U {^x A -■y)]. 
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Fig. 3. Architecture of Xchek. 



each value i. This is simply P~ ^ , the inverse of the valuation function defined above. 
Thus, the task of the model checker is to compute given the transition function R. 

Since states are assignments of values to the variables, an arbitrary ordering imposed 
on A allows us to consider a state as a vector in , where n—\A\. Hence P^ and R can 
be thought of as functions of type CP ^ P and C‘^^ ^ C respectively. Such functions 
are represented within the model checker by multi-valued decision diagrams (MDDs), 
a multi-valued extension of the binary decision diagrams (BDDs) [3]. 

As an example, consider the coffee dispenser shown in Figure 1(b). Using the vari- 
able ordering (power, water, milk, cup), the state labeled COFFEE is just the vector 
s — (T, T, F, T), the one labeled FOAM is t = (T, F, T, T), and the existence of a T- valued 
transition between them is expressed by the fact that R =(T, T, F, T, T, F, T, T)=T or, 
more compactly, R{s,t)—A. 

Xchek uses two supplementary libraries: a library for handling quasi-boolean lat- 
tices and an MDD library. The former includes functions to compute unions and inter- 
sections of sets of logical values, determine whether given lattices have some desired 
properties, e.g., distributivity, and to support various lattice-based calculations. Our li- 
brary is based on Freese’s Lisp lattice library [11]. The MDD library is described below. 

5.1 Data Structures 

There is an extensive literature dealing with MDDs [21], mostly in the field of circuit 
design. To our knowledge, the logics used in that literature are given by total orders 
(such as the integers modulo n) and not by arbitrary quasi-boolean lattices, but we 
concede that this is a minor difference. Also, as far as we know, they have not been 
used in formal verification before, so for the purposes of this paper we will describe 
them briefly. We will assume a basic knowledge of BDDs [3]. 

The basic notion in the construction of binary decision diagrams is the Shannon 
expansion. A boolean function f of n variables can be expressed relative to a variable 
ao, by computing / on n — 1 variables with ao set to T, and the same function with uq 
set to ±. These functions are referred to as fj and fj_ , respectively. We write this ex- 
pansion as /(ao, . . . , an-i) fr{ai, ctn-i), /i(cti, • • • , ctn-i) This notion 
is generalized as follows: 

Definitions. [21] Given a finite domain D, the generalized Shannon expansion of a 
function f : D, with respect to the first variable in the ordering, is: 
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/(ao, cti, . . . , ctn-i) ^ • • • , , /|D|-i(ai, . . . .a^-i) 

where fi = f[ao/di], the function obtained by substituting the literal di G D for uq in 
f. These functions are called cofactors. 

Definition 6. Assuming a finite set D, and an ordered set of variables A, a multi-valued 
decision diagram (MDD) is a tuple (1/, var, child, image, value) where: 

- V — Vt yj Vn is a set of nodes, where Vt and Vn indicate a set of terminal and 
non-terminal nodes, respectively; 

- E (IV X V is a set of directed edges; 

- var : A is a variable labeling function. 

- child \Vn ^ D is an indexed successor function for nonterminal nodes; 

- image : V ^ 2^ is a total function that maps a node to a set of values reachable 
from it; 

- \ Vt ^ D is a total function that maps each terminal node to a logical value. 

We describe constraints on the elements of an MDD below. Although D may be any 
finite set, for our purposes we are interested only in lattices; so instead of D we will 
refer to elements of the finite lattice (£, □) modeling a logic. 

Consider the function / = xiAx2, with fo = F,^i=M,f2=T. The MDD for this 
expression is shown in Figure 4 b. The diagram is constructed by Shannon expansion, 
first with respect to i , and then (for each cofactor of /) with respect to X2. The dashed 
arrows indicate / and its cofactors, and also the cofactors of the cofactors. 



T 

M 

F 



a) 




Fig. 4. (a) A three- valued lattice, (b) The MDD for / = A ^^2 in this lattice. 



The following properties hold for all MDDs: 

Vuo G Vn : out(uo) = \C\ A Vt/iG ^ out(ui) = 0 (semantics of nodes) 

Vuo, G C, G : (uq, ui) ^ E ^ child£(uo) = ui (semantics of edges) 

where out(u) stands for the number of non-null children of u. Several further properties 
are required for the data structure to be usable: 

\fuQ,ui^Vn : {uq,ui)^E a var(i/o) = a* A var(i/i) = ^ j (orderedness) 

Vi/o,i/iGC : /^° = ^ I/O = 1/1 (reducedness) 

yuo^ui^Vmi ^ C : 

(var(i/o) = var(i/i)) A (child/(i/o) =child£(i/i)) ^ i/o = (uniqueness 1) 
Vi/Oji/iGVi : (value(i/o) =value(i/i)) ^ I/O = 't/I (uniqueness 2) 
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In general, the efficiency of decision diagrams, binary or multi-valued, comes from 
the properties of reducedness and orderedness (defined above). Orderedness is also re- 
quired for termination of many algorithms on the diagrams. Uniqueness implies re- 
ducedness [21] - MDDs will be unique by construction, and thus reduced. 

Note that in general we do not distinguish between a single node in an MDD and the 
subgraph rooted there, referring to both indiscriminately as u. The function computed 
by an MDD is denoted f^:C^ C, and is defined recursively as follows: 

u ^ Vt ^ {sq, = value(i/) (terminal constants) 

u eVn ^ /“(so, • • • , S„-l) = (so, . . .,Si_i, Si+I, . . 

where a* = var(i/) and s G JC^ (cofactor expansion) 

Consider the MDD in Figure 4. To compute / = xiAx 2 with xi = T and X 2 = F 
using this diagram, we want to find f{s) where s = (T, M) . We begin at the root node. 
Its var is (Ti, so we pick out 5i, which is T, and descend to the node childT(/), indicated 
by the arrow to /2 (which represents the function TA(T 2 ). Now we compute / 2 (M) by 
choosing chi Mm (/2 ) . which is a node in Vt , so we stop and return M. Thus, we conclude 
that/(T,F) = M. 

We will be calculating equality, conjunction, disjunction, negation, and existential 
quantification on the functions represented by MDDs. MDDs have the same useful 
property as BDDs: given a variable ordering, there is precisely one MDD representation 
of a function. This allows for constant- time checking of function equality. 

Theorem 2. Canonicity [21] For any finite lattice (or finite set) C, any nonnegative 
integer n, and any function f : ^ C, there is exactly one reduced ordered MDD u 

such that = /(ao , . . . , a^-i). 

In the boolean case, BDDs allow for constant-time existential quantification, since 
any node which is not a constant ± is satisfiable. In order to implement multi-valued 
quantification efficiently, we introduce the image attribute of MDD nodes, which stores 
the possible outputs of functions. The following properties hold for image: 

u ^ Vt ^ image(i/) = {value(i/)} (image property 1) 

1/ G Un ^ image(i/) = image(child£(i/)) (image property 2) 

Definition 7. A function f is f-satisfiable if some input yields i as an output, or, equiv- 
alently, f-\f) 7 ^ 0.* 

(/^)“^(f)7^0 <4> fGimage(u) (correctness of image) 

(3se£" :/“(«)) = (Vse£»/“(«)) = (V^eimageCu)^ (existential quantification) 

To demonstrate how existential quantification works, we refer again to the example 
in Figure 4, and compute ^X 2 : xiAx 2 . There are two nodes labeled with X 2 to be 
dealt with. By inspection we see that image(/i) = {F, M} and image(/ 2 ) = {F, M, T}. 
So fi is replaced with the terminal node F VM = M, and /2 with the terminal node 
FVMVT = T. 

In general, algorithms for manipulating BDDs are easily extensible to the multi- 
valued case, provided they do not use any optimizations that depend on a two-valued 
boolean logic (e.g. complemented edges [20]). The differences are discussed below. 
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function MakeUnique(name, children) 
find (create if not found) a node u s.t. 
var(u) = name A 
Vf, child£(u) = children(^) 

return u 

function Quant if y(u, i) 

I I existentially quantifies over all variables Qj with j > i. 
if var(u) < i 

then foreach i ^ C 

children(f) := Quant if y(child£(u), «) 
return MakeUnique(var(u), children) 
else return Latt ice.bigOR(image(u)) 

function Apply (op, ui, U2) 

// applies the lattice operation op to the MDDs ui and U 2 
global G = |ui| X |u2| array of int 

Apply^(op, ui,U2) 

function Apply ^(op, ui, U2) 

I I helper function for Apply which actually does the work 
if G[ui][u 2 ] non-empty 

then return G [u 1 ] [u2 ] 

else 

ifui gGAu2 GG 

then u := Latt ice.doOp(ui , U2, op) 
else if var(ui) = var(u2) 

then foreach £ G G 

children(^) := Apply^(op, child£(ui), child£(u2)) 
u := MakeUnique(var(ui), children) 
else if var(ui) < var(u2) 

then foreach £ G G 

children(^) := Apply^(op, child£(ui ), U2) 
u := MakeUnique(var(ui), children) 

else 

foreach f G G 

children(^) := Apply^(op, u \ , child^(u2)) 
u := MakeUnique(var(ui), children) 

G[ui][u2] := U 

return u 

Fig. 5. The MDD algorithms MakeUnique, Quantify and Apply for binary operators. 
Apply for unary operators is defined similarly. 

The most-used method in an MDD (or BDD) library is MakeUnique, defined in 
Figure 5. This guarantees uniqueness and thus reducedness [21]. MakeUnique is not a 
public method, but it is used by most of the public methods. 

The public methods required for model checking are: Build, to construct an MDD 
based on a function table; Apply, to compute A, V and -1 of MDDs; Quantify, to exis- 
tentially quantify over the primed variables; and All Sat to retrieve the computed par- 
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function EX(P<^) 

return Quant if y(Apply(A, i?, Prime(P(^)), n) 
function QUntil(quantifier, P^) 

QUo = 

repeat 

if (quantifier is A) 

AXTerm*+i := Apply(->, EX(Apply(->, Qt/*))) II kl{QUi) 

EXTerm^+l :=EX(Qt/^)) 

else 

AXTerm*qi := 

EXTerm*qi := EX(Apply(->, (5t/^))) 

QU^^l := Apply(V, (Apply(A, P^, Apply(A, EXTerm*+i , AXTerm*+i )))) 
until QUi^i = QUr 

return QUn 

procedure Check(p, M) 

Case 

p G A: return Build(p) 

p — -!(/?: return Apply(-i, Check(c/?, M)) 

p — (f Alp: return Apply(A, Check(c/?, M), Check(?/j, M )) 

p = (fV ip: return Apply(V, Check(p, M), Check(?/j, M )) 

p = EX ip: return EX(Check(p, M)) 

p = AAp: return Apply(->, EX(Apply(->, Check(p, M)))) 

p = E[ipU ip]: return QUnt il(E, Check(p, M), Check(?/j, M)) 
p = A[ipU ip]: return QUnt il(A, Check(p, M), Ch.eck{ip, M)) 

Fig. 6. The multi-valued symbolic model checking algorithm. 

tition P~^{C). Build ensures orderedness of MDDs while they are being constructed, 
and Apply preserves it. Apply and Quantify are shown in Figure 5. Note how each 
interfaces with the lattice library: Apply calls the method Latt ice.doOp to compute A 
or V of two terminal nodes, while Quantify requires Latt ice. bigOR to compute the 
disjunction of an MDD’s image-set. 

An additional function. Prime, primes all of the variables in an MDD. In general, 
primed and unprimed variables may be mixed in the variable ordering, but for the pur- 
poses of this presentation, primed variables are always higher in the ordering. For in- 
stance, (a, c, b, a^, c^, b^) is an acceptable variable ordering, but (a, a^, b, b^, c, c^) is 
not. Quantify will still work in the more general case, but some preliminary variable 
reordering will be needed. 

5.2 The Model Checker 

Symbolic model checkers for boolean logic [7,4] are naturally extended to the multi- 
valued case. The model checker presented here is a symbolic version of the multi-valued 
model checker described in [6] . 

The full model checking algorithm is described in Figure 6. The function EX(Pt^) 
computes Pexi^ symbolically; QUntil carries out the fixed-point computation of both 
AU and EU . AX(p is computed as ->EX-xp. EG, AG, EE, AE, ER and AR are not 
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Fig. 7. MDD for R, representing the transition relation for the coffee dispenser. 



shown in this Figure, but could be added as cases and defined in terms of calls to EX, 
QUntil, and Apply. 

Proposition 1 The function EX(Pt^) computes EXp. That is, a state vector s i-satisfies 
Quant if y(i? A , n) if and only if{\/^^^ R{s, t) A = t 

To illustrate the algorithm, we compute the partition given by the XCTL formula 
EX {cup) in the coffee dispenser example in Figure 1(b). This computation is equiva- 
lent, in symbolic terms, to computing 3v' , s.t. R A Pcup', where is a primed state 
vector, the intuition here is the quantification over all possible next states. This is im- 
plemented in the model checker by the expression 

Quant if y(Apply(A, R, Prime(F’t^)), n) 

Not every n-ary function over an arbitrary quasi-boolean lattice can be written in the 
relatively economical form of a propositional formula, and so we need to show either 
an MDD or a function-table representation. We will use MDDs. The MDD for the 
transition relation of the model (i?) is shown in Figure 7. For clarity and to save space, 
we used the following conventions: (a) all state variables are abbreviated to their initial 




416 



Marsha Chechik, Benet Devereux, and Steve Easterbrook 




Fig. 8. The MDD for T’cup' • 




Fig. 9. The lowest level of the MDD for R A Pcup' • Dotted lines indicate the transitions 
which existed in Figure 7 and are now set to F. 

letters; e.g., m stands for milk, w for water, etc; (b) the MDD is not reduced; (c) the 
transitions to the terminal node F are not shown. Note that there are no transitions to 
node DC in this diagram. The MDD for T'cup' is given in Figure 8. 

We start by computing R A Pcup ^ . The top part of the MDD is the same as shown in 
Figure 7. The bottom row is given in Figure 9. The conjunction of two MDDs with the 
same variable in the root node is carried out by pairwise conjunction of their children; 
for instance, consider the leftmost node in Figure 7 labeled with cup^ indicated by 
the dashed box; its childDc is T, and its other children are F. The MDD for cup^ has 
child^ = £ for all £ G C. Their conjunction, then, has chi Id ^ = I fox any I ^ C except 
for DC; DC A T = DC, so the child is DC, as shown by the dashed box in Figure 9. 

To complete the computation, the model checker needs to existentially quantify 
over the primed variables. Quantify replaces all primed- variable nodes u which are 
immediate children of unprimed- variable nodes, with the constant node Y image(u). 
For instance, we can see by inspection that the leftmost subgraph with power^ at the 
root (which corresponds to the successor states of OFF) has DC, N, F in its image-set, 
so it is replaced by the terminal node \/{DC,N,F} = DC; from this we conclude that 
EXcvi^ has the value DC in state OFF, the model’s initial state. 

The properties of the coffee dispenser example given in Section 2 and formalized in 
Section 4 can be model checked with the results given in Table 1 . 



Num 


Property 


Result 


Comment 


1. 


FF(water) 


T 


as expected 


2. 


FF(milk) 


S 


as expected 


3. 


AG(¥ater — )■ cup) 


T 


as expected 


4. 


AG(¥ater^ AX A.[-i¥ater >V (->cupA-'¥ater)]) 


S 


as F(C0FFEE, FOAM) = S 



Table 1. Results of model checking the coffee dispenser. 
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MDD Method 


Running Time 


Notes 


MakeUnique(var, child) 


0(1) 


Hash-table lookup. 


Build(/) 


o(|£n 


0(size of the function table to convert to MDD). 


Apply(op, ui,U 2 ) 


0 ( Ui U2I) 


The worst-case is pairwise conjunction of every node 
in ui with every node in U2 . 


Quant if y(u, i) 


0(|«|) 


Depth-first traversal of the graph. 


Prime(u) 


0(|«|) 


Same as above. 



Table 2. MDD methods used for model checking, and their running times. 



5.3 Analysis 

Table 2 shows the running times of MDD operations used by the model checker in terms 
of |i/|, the size of the MDD. In the worst case, this size is 0{\C\^) [21]. 

The running times of MDD methods Build and Apply critically depend on the 
sizes of MDD structures. In order to form a rough estimate of these sizes in the average 
case, we ran the MDD library on several test sets, with results shown in Figure 10. 
Each data point in the graph stands for a set of 200 MDDs, each representing a function 
generated by filling in a random value (chosen from a uniform distribution) from C 
for each possible input. We generated one such set for 3, 4, and 5 variables for lattices 
ranging in size from 3 to 8 (the x-axis of the graph), and took the average size of the 
MDDs representing the functions. The figure shows the worst-case size |£|^, and our 
experimental results, for n = 4 and n = 5; n = 3 is similar. 





Fig. 10. Worst-case and experimental average-case sizes of MDDs plotted against lat- 
tice size, (a) n = 4, (b) n = 5. 

The results show the average size of the generated MDDs to be roughly a 

linear improvement over the worst-case 0{\C\^). We recognize the weakness of this 
methodology: that it does not give a good idea of how the structure of the problem 
affects the size of MDDs. We suspect that the structure of the model checking problem 
results in a somewhat better improvement, but do not yet have adequate benchmarks 
with which to test this hypothesis. In the future, we would like to perform the same test 
for an appropriate test suite of multi-valued models, to check whether the structure of 
the model checking task has an impact on the size of MDDs. 
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The running time of Xchek is dominated by the fixpoint computation of QUntil. 
The proof of termination of this algorithm is based on each step of QUntil being a 
monotonic lattice operator; a detailed proof can be found in [6] . The total number of 
steps is bounded above by |£ x /i (/i is the height of the lattice £), and the time of each 
step is dominated by the time to compute the EXTerm and AXTerm, which is O ( |£ ; 

so the worst-case running time for Xchek is x h), where h is the height of the 

lattice. The results of Figure 10 suggest that in the average case, each step’s running 
time is O ( I £ , for an average termination time of x h x |p | ) , where \p\ 

is the size of the XCTL formula. 

At first glance, MDDs appear to be performing significantly worse than BDDs 
{0{\C\^) versus 0(2^) in the worst case). However, our multi-valued logics compactly 
represent incompleteness in a model. For example, suppose we have a model with n 
states and wish to differentiate between p of those states (p « n) by introducing 
an extra variable a. In classical model checking this uncertainty can only be handled 
by duplicating each of n — p states (one for each value of a). In fact, most of these 
states are likely to be reachable; thus, the size of the state space nearly doubles. In the 
multi-valued case, the reachable state-space will increase at most by p states. This com- 
putation did not take into an account the presence of “unknown” transitions; these could 
also be encoded into the binary representation, but would lead to a further state-space 
increase. Thus, we expect that often our model checker would perform as well as the 
classical one, and on some problems even better. 

Finally, the scope of the applicability of an MDD-based model checker includes 
reasoning about inconsistent models. 



6 Conclusion and Future Work 

Multi-valued logics can be useful for describing models that contain incomplete in- 
formation or inconsistency. In this paper we presented an extension of classical CTL 
model checking to reasoning about arbitrary quasi-boolean logics. We also described 
an implementation of a symbolic multi-valued model checker Xchek and illustrated it 
using a simple coffee dispenser example. 

We plan to extend the work presented here in a number of directions to ensure that 
Xchek can effectively reason about non-trivial systems. We will start by addressing 
some of the limitations of our XKripke structures. In particular, so far we have assumed 
that our variables are of the same type, with elements described by values of the lat- 
tice associated with that machine. We need to generalize this approach to variables of 
different types. We are also working on generalizing our algorithm to verification of 
properties expressed in CTL*. 

In this paper we concentrated our attention on a purely symbolic model checker. 
The union, intersection, and quantification were computed using MDD operations. Al- 
ternatively, one can build a table-driven model checker, where such operations are ta- 
ble lookups. This model checker has the same running time as the MDD-based one. 
However, lattice-theoretic results can be used to significantly optimize the table-driven 
model checker. Our report on this work is forthcoming [5]. 
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Abstract. Fair-cycle detection, a core problem in model checking, is 
solvable in linear time in the size of the design model using an explicit- 
state representation. Existing cycle-detection algorithms for symbolic 
model checking are quadratic or n log n time in the worst case and often 
inefficient in practice. Which default symbolic cycle-detection algorithm 
to implement in model checkers remains an open question. We compare 
several such algorithms based on the numbers of external and internal 
iterations and the numbers of image operations that they perform on 
both randomly- generated and real examples. Unlike recent work by Ravi, 
Bloem, and Somenzi, we conclude that model checkers need to implement 
at least two generic cycle-detection algorithms: the traditional Emerson- 
Lei algorithm and one that evolved from our study, originally due to 
Hojati et al. We demonstrate that these two algorithms are complemen- 
tary, as the latter algorithm is provably incomparable to Emerson-Lei’s 
and often dominates it in practice. 



1 Introduction 

Model checking, whether for LTL, CTL, or cj-automata, has linear time com- 
plexity in the size of the design model. This well-known result follows from 
two facts: first, that most model checking techniques reduce to the problem 
of locating cycles through a given set of nodes in a graph [3,18]; second, that 
cycle detection is solvable in linear time using a depth-first search that identi- 
fies strongly-connected components (cf, [4]). This depth-first strategy provides 
a suitable approach to cycle detection in explicit-state model checking, and has 
been implemented in several tools [7,11]. 

Depth-first approaches to cycle detection are not suitable for BDD-based 
symbolic model checking because BDDs represent sets of states while depth-first 
search examines individual states. Efficient BDD-based model checking requires 
efficient breadth-first, set-based cycle-detection algorithms. Most modern sym- 
bolic model checkers employ some variant of Emerson and Lei A symbolic cycle- 
detection algorithm [5]. CTL model checkers use the Emerson-Lei algorithm 

* Work partially supported by NSE Grant CCR-9988322 and a grant from the Intel 
corporation. 

T. Margaria and W. Yi (Eds.): TACAS 2001, LNCS 2031, pp. 420-434, 2001. 

(c) Springer- Verlag Berlin Heidelberg 2001 
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(henceforth El) to process formulas of the form EG which specify infinite 
paths on which every state satisfies Lp. Linear-time model checkers compose the 
design model with an automaton representing the negation of the property, then 
check for cycles in the product automaton using the CTL formula EG true. Un- 
fortunately, EL A time complexity is not linear in the size of the design model: the 
algorithm contains a doubly-nested fixpoint operator, and hence requires time 
quadratic in the design size in the worst case. The algorithm is also often slow 
in practice. EL is a so-called SCC-hull algorithm [16]. SCC-hull algorithms com- 
pute the set of states that contains all fair cycles. In contrast, SCC-enumeration 
algorithms enumerate all the strongly connected components of the state graph. 
While SCC-enumeration algorithms have a better worst-case complexity than 
SCC-hull algorithms [1], their performance in practice seems to be inferior to 
that of SCC-hull algorithms [16]. This paper focuses on SCC-hull algorithms. 

Researchers have proposed several alternatives to EL [8,10,14]. Ravi, Bloem, 
and Somenzi have presented both a classification scheme for such algorithms 
and an experimental comparison of several algorithms with EL [16]. They con- 
cluded that no algorithm consistently outperforms EL for cycle detection, and, 
consequently, there is no reason to “dethrone’^ EL as the default cycle-detection 
algorithm. Their comparison, however, is based primarily on running times, and 
secondarily on numbers of image operations. This approach has two significant 
drawbacks: it provides no useful feedback on why the algorithms behave as ob- 
served, and it suggests no techniques for predicting when one algorithm might 
outperform another. Furthermore, their comparison considers some algorithms 
that are based on post-image operations and some that are based on pre-image 
operations (as is El), making it rather difficult to draw firm conclusions. 

This paper demonstrates a methodology that both addresses these concerns 
and identifies a symbolic cycle-detection algorithm that provides a viable al- 
ternative to EL. Ravi et al, present bounds on the number of image operations 
performed by various cycle-detection algorithms. We argue that to understand 
the performance of SCC-hull algorithms one needs to measure both the number 
of image computations as well as the number of external iterations (defined in 
Section 2). Our methodology focuses on the number of external iterations per- 
formed as a basis for comparing and refining symbolic cycle-detection algorithms. 
In aiming to balance the numbers of external and internal iterations performed, 
we have identified an algorithm that, as we argue, should join EL as a generic 
cycle-detection algorithm. We demonstrate that this algorithm is incomparable 
to EL, dominating it in many cases. Our conclusion is that, as in many other as- 
pects of model checking, there is no “best” cycle-detection algorithm and model 
checkers need to implement at least both EL and our algorithm. 

Section 2 describes our analyses of three existing symbolic cycle-detection 
algorithms and shows how the competitive algorithm evolved from these anal- 
yses. Section 3 presents experimental results on randomly generated and real 
examples for both the special case of terminal and weak systems and more gen- 
eral examples. Section 4 compares the competitive algorithm to a specialized 
cycle-detection algorithm for terminal and weak systems. Section 5 concludes. 
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2 Symbolic Cycle-Detection Algorithms 

Cycle-detection algorithms in the context of model checking search for “bad^^ 
cycles in a directed graph representing a transition system modeling a design 
nndergoing verification. Two parameters specify which cycles are considered bad: 
the invariant and the fair sets. The invariant specifies a condition, snch as a 
propositional formnla, that mnst be trne of every state on a bad cycle. The fair 
sets specify sets of states that every bad cycle mnst pass throngh. We write 
EGfair^F indicate a search for cycles satisfying invariant Lp and passing throngh 
fair sets fair. We will omit the fair annotation when all states are considered fair. 

Cycle detection in BDD-based model checking is challenging becanse the 
BDDs co-mingle information abont different paths throngh a design. Symbolic 
cycle-detection algorithms maintain a set of states that may lead to bad cycles; 
this set is conservative, in that it contains all states that do lead to bad cycles. 
We call this the approximation set. The algorithms repeatedly refine the approx- 
imation set by locating and removing states that cannot lead to a bad cycle; we 
call this the pruning step. If a state lies on a bad cycle, then it must have a suc- 
cessor and a predecessor on that same cycle (and thus also in the approximation 
set). Cycle-detection algorithms use this information in different ways. 

Formally, these algorithms search for cycles in nondeterministic transition 
systems. A transition system is a tuple (Q, R, Qo,R), where Q is a set of states, 
Qo C Q is the initial state set, R C Q x Q is the transition relation, and R C Q 
is the set of fair states. A transition system is weak iff (1) there exists a partition 
of Q into sets Qi, , Qn such that each Qi is either contained in R or is disjoint 
from it, and (2) the Qi^s are partially ordered so that there is no transition from 
Qi to Qj unless Qi < Qj. If the Qi^s contained in R are the maximal elements of 
the partial order, a weak system is called terminal. This definition of weak and 
terminal transition systems is due to Bloem, Ravi, and Somenzi [2], as refined 
from Kupferman and Vardi [15]. In model checking, designs commonly have 
several fair sets, and bad cycles must pass through each fair set. Such designs 
are outside the scope of weak systems, whose definition is only meaningful for 
one fair set.^ 

EL appears in Figure 1 (left).^ At each iteration through the while loop, 
EL computes the set of states that can reach every fair set via a non-trivial 
path contained in the approximation set, b. We call these iterations external] 
the reachability computations (the EU formula) form the internal iterations. EL 
does most of its work in the internal iterations: each external iteration performs 
only one preimage computation per fair set outside of the internal iterations. 

Hardin et al. attempted to reduce the number of external iterations that 
EL performs as a means of achieving an improved algorithm [8]. Their algo- 
rithm, called Catch-Them-Young (henceforth CTy), aggressively prunes the set 

^ LTL-to-automaton translation algorithms may yield multiple fair sets when one 
would suffice, rendering an otherwise weak system non- weak. Thus, minimizing the 
number of fair sets is an important optimization. 

^ Figure 1 shows VIS’ implementation of EL; in SMV, the final image computation 
(b := b A EX d) is outside the for loop. 
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b := invariant ; 
while b changes do 

for each fair set do 
d := E[6 U (T, A b)] ; 
b := bAEX d ; 



b := invariant ; 
while b changes do 

for each fair set do 
:= T,Ab ; 

b := E[true U A E[true S 
while b changes do 

6 := 6 A EX 6 A EY6 ; 
res := EF b: 



b := invariant ; 
while b changes do 

for each fair set do 
T,Ab \ 

b := E[6 U (6 A EX Ti)] ; 
while b changes do 
6 := 6 A EX 6 ; 



Fig. 1. The EL (left), CTY (middle), and OWCTY (right) cycle-detection algo- 
rithms. In CTY, EP Ti denotes all states that can reach Ti and EY b denotes 
the snccessors of b. A variant of CTY, CTY+, replaces “trne” with b in the EU 
and ES compntations. Each algorithm initializes the approximation set to states 
satisfying the invariant. 



of states potentially lying on bad cycles dnring the internal iterations (a closely 
related algorithm was proposed in [10]). This can rednce the number of external 
iterations by removing states during an external iteration that a later external 
iteration would otherwise handle in EL.^ The original CTY algorithm does cycle 
detection only; it does not compute EG as EL does. For consistency. Figure 1 
(middle) provides a version of CTY that can be used to compute EG; this entails 
one difference from the original algorithm: the extra EE computation in the last 
step of the algorithm. 

The external iterations in CTY perform two steps: first, compute the set of 
states that are both reachable from and can reach every fair set (the internal it- 
erations); second, repeatedly prune the approximation set until it is closed under 
both successors and predecessors. In contrast, EL prunes the approximation set 
only once and removes only states which have no successor in the approximation 
set; EL does not iterate the pruning step within one external iteration. CTY can 
eliminate states from the approximation set earlier than can EL, hence the name 
“Catch-Them-Young” . Like EL, CTY has quadratic time complexity with respect 
to the size of the design. Hardin et aL^s experimental results, conducted over a 
large set of randomly-generated designs, were mixed; CTY tended to outperform 
EL when there was no bad cycle, but performed worse than EL in the presence of 
cycles [8]. CTY A aggressive pruning strategy succeeded in reducing the number of 
external iterations, but nevertheless incurred a noticeable performance penalty. 

In order to understand why CTY fails to outperform EL, we must examine each 
algorithm’s actual computations. This paper studies patterns of image compu- 
tations and external iterations, as the former are the most expensive operations 
in a BDD-based setting and the latter greatly impact the performance of cy- 
cle detection algorithms. Section 3 presents numeric data from this analysis. In 
summary, while CTY performs significantly fewer external iterations than EL, it 
does not reduce the number of image computations. In essence, EL does too little 
work outside the internal iterations whereas CTY does too much work overall. 
Engineering a better balance between the iterations might yield an algorithm 
that consistently outperforms both EL and CTY. One key difference between EL 

^ Though EL may eliminate states in earlier iterations than CTY. 
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and CTY is that EL prunes based only on successors, whereas CTY considers both 
successors and predecessors. An intermediate approach could perform CTY^s re- 
peated pruning, but using only pre-image computations, as in EL [19]. This could 
greatly reduce the number of image computations of CTY, though perhaps at the 
expense of some additional external iterations. The resulting algorithm, called 
One-Way-Catch-Them-Young (henceforth owcty), appears in Figure 1 (right).^ 
OWCTY is essentially the pre-image version of Hojati et aL^s el2 algorithm (sans 
an initial reachability computation) [10]; its pruning strategy is similar in spirit 
to that of Kesten et aL^s algorithm for cycle detection in the presence of strong 
fairness [14] (which uses forward instead of backward image operations). 

How do OWCTY and EL compare? Hojati et aL^s experiments on a small 
set of small examples discussed only running time and were inconclusive for 
these two algorithms. Ravi et aL^s experiments compared EL and the forward- 
operator version of El2/owcty; this is not too meaningful, since the issue of 
forward vs. backward reachability [9] is orthogonal to the balance between ex- 
ternal and internal iterations (indeed, the upper bounds obtained in [16] for EL 
and forward-EL2 are incomparable). OWCTY ’s worst-case running time has only 
a linear overhead (see below) over the 0{\T\dh) worst-case upper bound that 
Ravi et al. identified for EL [16] (where \T\ is the number of fairness constraints, 
d is the diameter of the state graph, and h is the length of the longest reachable 
path in the SCC quotient graph). A worst-case analysis as done in [16] provides, 
however, only a very coarse comparison between the two algorithms. First, the 
overhead of OWCTY over EL is not very significant. Second, the worst-case in- 
stances for EL may be different than those for OWCTY, which means that the 
comparison of worst-case running times does not tell us how the two algorithms 
compare on a given input instance. A more meaningful analysis would compare 
how the two algorithms perform on concrete instances. Analysis at this level 
shows that the two algorithms are incomparable. Figure 2 illustrates the differ- 
ences between the EL and OWCTY pruning strategies; OWCTY outperforms EL on 
the first transition system, while EL outperforms OWCTY on the second. 




Fig. 2. Two transition systems that illustrate the differences between EL and 
OWCTY. Black circles denote fair states. All states satisfy the invariant. 



Consider the first transition system. Both algorithms eliminate the rightmost 
state in the first iteration and capture the remaining states in the approximation 
set. During the first iteration, OWCTY eliminates all but the leftmost fair state; 

^ A variant of OWCTY performs pruning inside the for loop; in practice, neither version 
consistently outperforms the other. 
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EL eliminates only the rightmost fair state. EL requires an additional iteration to 
eliminate each of the four middle fair states. Each iteration involves a reachability 
computation that OWCTY does not perform. If the chain of fair states in the first 
system contained n fair states, OWCTY would perform 0{n) image computations 
while EL would perform 0{n^) image computations. Thus, EL has a quadratic 
overhead relative to OWCTY on such systems. 

Now consider the second transition system. In the first iteration, both al- 
gorithms eliminate the rightmost state and retain the remaining states in the 
approximation set. During the first iteration, EL throws away the rightmost fair 
state. The reachability computation in the second external iteration begins at 
the middle fair state; thus, EL eliminates the non-fair states between the right 
two fair states without traversing them explicitly again. OWCTY, in contrast, 
uses an additional image computation to eliminate each of those non-fair states. 
The second system currently contains two copies of a chain of states consisting 
of four non-fair states, followed by a fair state, followed by a non-fair state with a 
self loop. If the system had k consecutive copies of this chain, each with m states 
in the initial non-fair chain, EL would perform 0{k‘^m) image computations as 
compared to owctyA 0{k‘^m + km) — 0{k‘^m) image computations. That is, 
the overhead of OWCTY relative to EL is only linear. 

In general, the two algorithms are incomparable with respect to their numbers 
of image computations. As OWCTY provably performs no more external iterations 
than EL, OWCTY A overhead (if it exists at all) is caused by the last line of the 
algorithm, which prunes the approximation set. Thus, OWCTY A overhead is at 
most linear relative to EL, while, as we saw, EL can have a quadratic overhead 
relative to OWCTY. 

To gain a better picture on the comparative performance of EL, CTY, and 
OWCTY, the experimental analyses in Section 3 gather data on the numbers of 
external iterations across several randomly generated and real examples; to com- 
plement the Ravi et al, study [16], we also include running time, memory usage, 
and BDD size statistics. Our analyses show that OWCTY requires almost the 
same number of external iterations as CTY with far fewer image computations; 
in practice, OWCTY almost always matches or improves on EL A performance. 

3 Comparative Analysis of the Algorithms 

3.1 Experiments on Random Systems 

Our first set of experiments compares the algorithms on random systems. We 
generate random systems by generating random directed graphs. We would like 
to obtain directed graphs with non-uniform out-degree and linear density (/.e., 
a linear number of edges in the number of nodes); linear density prevents cy- 
cle detection from becoming trivial due to an excess or paucity of edges. The 
following model of random graphs, due to Karp [13], satisfies these criteria: 

Definition 1 For each positive integer n and each p with 0 < p < 1, the sample 
space consists of all labeled digraphs with n vertices and edge probability p. 
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Given a graph G with vertices G and edges the order of G is |G| and the 
density of G is |G'|/|G|. We will nse n and d to represent a graph’s order and 
density, respectively. We wish to generate graphs in the space Generating 

the graphs directly based on this model becomes time consnming as n grows 
larger: the procednre mnst decide whether to inclnde each of the possible n(n — 1) 
edges based on the probability d/n. Instead, we fix the nnmber of edges to be the 
expected nnmber dn, and choose dn distinct edges from the n{n — 1) candidates. 
This approach provides a very good approximation to the given model [19]. 

Onr experiments compare fonr algorithms: EL, CTY, CTY+, and OWCTY. 
CTY+ is a variant of CTY that restricts the reachability compntations to consider 
only paths throngh the approximation set, rather than throngh the entire state 
space as in CTY [19]; in other words, CTY+ replaces line 5 of CTY with b := 
E[6 U Ei] A E[6 S Ei], where S is the past-time operator since. We present 
two sets of results. The first measures the number of external iterations that 
each algorithm performs, the next measures the number of image computations 
that each algorithm performs.^ The experiments use graphs with order 2^^ and 
densities varying over 1.2, 1.6, 2.0, and 2.4. This order is large enough to explore 
the behavior of the algorithms, yet small enough to analyze in a reasonable 
amount of time. We define a single fair set for each graph, with size varying over 
.Oln, .In, .3n, .5n, .7n, and .9n where n is the digraph order. Each experiment 
fixes either the density or the size of the fair set and varies the other. The figures 
reported in the rest of this section are averaged over 100 individual experiments. 







T\ 


.Oln 


.In 


.5n 


.9n 


CTY 


2.18 


2.41 


2.09 


2.00 


CTY-h 


2.18 


2.41 


2.09 


2.00 


OWCTY 


2.17 


2.37 


2.07 


2.00 


EL 


2.66 


5.36 


13.20 


20.89 





d 1 


1.2 


1.6 


2.0 


2.4 


CTY 


2.00 


2.00 


2.00 


2.00 


CTY-h 


2.00 


2.00 


2.00 


2.00 


OWCTY 


2.00 


2.00 


2.00 


2.00 


EL 


20.89 


10.37 


7.02 


5.09 



Table 1. Average number of external iterations on digraphs with order 2^^. The 
left table fixes the density at 1.2 and varies the fair set size. The right table fixes 
the fair set size at .9 x 2^^ and varies the density. 



Table 1 shows the number of external iterations on digraphs with order 
n — 2^^. One set of experiments fixes the density at 1.2 and varies the fair 
set size; the other fixes the fair set size at .9 x 2^^ and varies the density. The ta- 
bles indicate that CTY, CTY+ and OWCTY perform far fewer external iterations 
than EL. Furthermore, OWCTY performs essentially the same number of external 
iterations as CTY; thus pruning based on predecessors as well as successors, as 
CTY does, does not significantly reduce the number of external iterations over a 
pruning strategy based only on successors. We therefore expect OWCTY to con- 
sume considerably fewer resources than CTY in practice. EL requires significantly 

^ We refer to post- and pre-image computations collectively as image computations. 
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more external iterations as the fair set grows larger, and significantly fewer ex- 
ternal iterations as the density increases. In contrast, CTY, CTY+, and OWCTY 
perform fairly consistent nnmbers of external iterations in both cases. 

The data in Table 1 do not indicate that CTY and OWCTY are more effi- 
cient than EL becanse the former algorithms may do more work in the internal 
iterations. The nnmber of image compntations offers a more precise efficiency 
comparison. Image compntations are the most compntationally expensive op- 
erations in each of the cycle-detection algorithms. The cost of these operations 
depends on the density and order of the nnderlying graphs [19]. Since we analyze 
the fonr algorithms over the same randomly generated graphs, the cost of indi- 
vidnal image compntations is comparable across the algorithms. The nnmber of 
image compntations is therefore a fair parameter for comparing the algorithms. 




Fig. 3. Number of image computations for EL, CTY, CTY+ and OWCTY. 



Figure 3 shows the number of image computations performed over graphs 
with order n = 2^^, density d — 1.2, and fair set size ranging over .Oln, .In, .3n, 
.5n, .7n, and .9n. For CTY, CTY+ and OWCTY the number of image computa- 
tions decreases as the fair set gets larger. CTY performs more image computations 
than CTY+ because CTY+ restricts reachability computations to the approxi- 
mation set, which allows the computation to converge faster. OWCTY performs 
fewer image computations than either CTY or CTY+ because it does not per- 
form forwards reachability. Separate data (not shown) show that the backwards 
reachability computations in OWCTY and CTY perform almost the same num- 
bers of image computations; furthermore, the pruning step in OWCTY performs 
roughly half as many image computations as that in CTY+[19]. Thus, eliminating 
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the forward image computations makes OWCTY less computationally expensive 
without adversely affecting the number of external iterations required. 

Separate experiments (not shown) show that the number of image compu- 
tations decreases sharply as the density increases [19]. In the case of EL, the 
number of image computations drops because the algorithm performs fewer ex- 
ternal iterations as density increases, as discussed previously. For the remaining 
three algorithms, our experimental data shows that the size of the approximation 
set after each iteration becomes larger as the density increases. The approxima- 
tion set determines the base set for subsequent reachability computations. The 
larger the base set, the faster reachability computations converge [19]. There- 
fore, fewer image computations are needed when the digraph density increases. 
Although each pruning step removes fewer vertices, the final approximation set 
is also larger, so the algorithms perform fewer image computations as density 
increases. Plots for running time statistics are similar to those for image compu- 
tations. In particular, both OWCTY and CTY consistently outperform EL. This 
contradicts the mixed results in other CTY versus EL experiments [8,16]. 



3.2 Experiments on Real Systems 

Our real design examples come from the VIS distribution and from Fabio Somenzi. 
They include an ethernet protocol with varying numbers of collisions before 
failure, a tree-structured arbiter with 8 nodes, a gcd circuit, a floating point 
multiplier, and two mutual exclusion protocols (bakery and eisenberg). These 
examples are written in Verilog and evaluated using the VIS model checker [17]. 
We implemented OWCTY within the VIS framework by replacing the original 
(el) algorithm for evaluating EG formulas with OWCTY in a copy of VIS. We 
ran the experiments using VIS version 1.3 (with version 1.2 of the vl2mv com- 
piler), on an Intel 686 machine with 1GB of memory running RedHat Linux 
version 2.2.12-20; our VIS installation uses the CUDD BDD package. 

Table 2 summarizes experiments with LTL model checking of terminal and 
weak systems. For each LTL experiment, we evaluated EGfairtrue on the product 
of the original design and a manually-constructed automaton for the negation of 
the property. Table 3 covers examples with multiple fair sets in the context of 
CTL model checking. Table 4 covers LTL model checking under multiple fairness 
constraints. In each table, stars on experiment names denote that the models 
contained cycles or that the property failed. The EX/EY and EU/ES figures count 
the number of image and reachability computations performed, respectively.® 

The tables show that OWCTY generally matches or outperforms EL, while 
CTY and CTY-h are clearly not competitive. In many cases, OWCTY outperforms 
EL dramatically; in contrast, we have not yet found an example on which EL sig- 
nificantly outperforms OWCTY. The benefits of OWCTY are particularly evident 
on the ethernet and gcd examples in Table 2. As expected, OWCTY uses fewer 
external iterations than EL; however, OWCTY sometimes performs more image 
computations than EL. 



The EU/ES counts do not include trivial computations of the form [ip U ip]. 
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Experiment 


Procedure 


Ext. 


EX or 


Time 


Mem 


peak live 






Iter. 


EX/EY 


(sec) 


(MB) 


BDD nodes 


ethernet 1 


EL 


51 


2179 


356.6 


13.6 


339932 




CTY 


2 


42/43 


187.9 


14.6 


398280 




CTY+ 


2 


41/42 


184.8 


14.6 


398280 


G(p Fg) 


OWCTY 


3 


57 


5.5 


11.7 


175118 


ethernet 2 


EL 


107 


6506 


10656.1 


14.4 


367135 




CTY 


2 


67/68 


1893.6 


33.6 


1365367 




CTY+ 


2 


66/67 


1887.6 


33.6 


1365755 


G(p Fg) 


OWCTY 


3 


113 


59.6 


14.1 


404723 


ethernet 3 


EL 


171 


11914 


4371.3 


13.7 


279823 




CTY 


2 


95/96 


1962.2 


35 


1456597 




CTY+ 


2 


94/95 


1938.0 


35 


1456597 


G(p Fg) 


OWCTY 


3 


177 


24.6 


13.8 


290593 


ethernet 4 


EL 


- 


- 


(30H) 


- 


- 




CTY 


2 


130/131 


5859.7 


53.6 


2320201 




CTY+ 


2 


130/2 


5895 


53.6 


2320201 


G(p Fg) 


OWCTY 


3 


245 


491.4 


14.1 


368225 


treearb 8* 


EL 


8 


75 


6.2 


13.6 


234021 




CTY 


- 


- 


(20M) 


(23) 


- 




CTY+ 


- 


- 


(20M) 


(23) 


- 


G(p Fg) 


OWCTY 


2 


24 


4.2 


12.7 


206640 


gcd 


EL 


- 


- 


(37H) 


- 


- 




CTY 


2 


15/3 


1384.2 


59.3 


2298351 




CTY+ 


2 


14/2 


1383.0 


59.3 


2298351 


G(p ^ XFg) 


OWCTY 


2 


24 


2497.5 


130.9 


6285856 


fpmult 


EL 


2 


18 


18345.8 


363 


17667058 




CTY 


2 


26/3 


33089.7 


369 


17619441 




CTY+ 


2 


18/2 


21994.7 


368 


17619441 


G(p^ XXXg) 


OWCTY 


2 


17 


22457.2 


369 


17422253 



Table 2. LTL model checking on weak and terminal systems. Parenthesized 
times indicate terminated compntations; M indicates minutes instead of seconds. 
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Experiment 


Procedure 


Nmn 

Fair 


Ext. 

Iter. 


EX / EU or 
EX/EY/EU/ES 


Time 

(sec) 


Mem 

(MB) 


peak live 
BDD nodes 


bakery 1* 


EL 


6 


18 


554 / 91 


1.3 


6.2 


34447 




CTY 


6 


11 


1371/650/67/66 


10 


13.3 


176492 




CTY+ 


6 


12 


344/299/51/50 


7.0 


13 


182755 


AG(p ^ AFg) 


OWCTY 


6 


18 


516 / 75 


1.6 


6.1 


36962 


bakery 2 


EL 


6 


18 


490 / 92 


1.3 


6.0 


29524 




CTY 


6 


11 


1239/614/67/66 


9.4 


13.3 


176492 




CTY+ 


6 


11 


282/246/47/46 


6.0 


12.7 


180657 


AG(p ^ AFg) 


OWCTY 


6 


18 


444 / 72 


1.4 


5.8 


28849 


treearbS* 


EL 


8 


15 


382 / 106 


14.8 


13.6 


328115 




CTY 


8 


- 


- 


(194M) 


(112) 


- 




CTY+ 


8 


- 


- 


a70M) 


(123) 


- 


AG(p ^ AFg) 


OWCTY 


8 


13 


416 / 104 


13.1 


13.4 


309449 


eisenberg2 


EL 


6 


27 


669 / 124 


1.6 


5.5 


17352 




CTY 


6 


23 


2159/2031/139/1381 


7.8 


11.2 


180311 




CTY+ 


6 


16 


252/506/56/55 


3.8 


8.6 


148353 


AG(p — > AFg) 


OWCTY 


6 


27 


631 / 102 


1.4 


5.4 


18504 


elevator* 


EL 


8 


12 


849/97 


498.2 


13.8 


275914 




CTY 


8 


- 


- 


(104M) 


(38) 


- 




CTY+ 


8 


- 


- 


(104M) 


(43) 


- 


AG(p ^ AFg) 


OWCTY 


8 


12 


861/79 


536.8 


13.6 


275914 



Table 3. CTL model checking on systems with mnltiple fairness constraints. 



Experiment! 


Procedure! 


Num 


Ext. 


EX / EU or 


Time 


Mem 


peak live 






Fair 


Iter. 


EX/EY/EU/ES 


(sec) 


(MB) 


BDD nodes 


treearb8* 


EL 


9 


15 


1021 / 135 


1397.8 


13.8 


239731 




CTY 


9 


- 


- 


(186M) 


(44) 


- 




CTY+ 


9 


- 


- 


(207M) 


(157) 


- 


G(p Eg) 


OWCTY 


9 


14 


1000 / 126 


911.6 


13.9 


369062 


eisenberg2 


EL 


7 


24 


1332 / 161 


5.7 


7.2 


47704 




CTY 


7 


24 


5114/5486/169/168| 


60.3 


13.7 


240028 




CTY+ 


7 


15 


229/399/53/52 


4.7 


8.8 


147763 


G(p ^ Eg) 


OWCTY 


7 


24 


1197 / 109 


5.3 


7.1 


59802 


elevator3* 


EL 


3 


2 


7/1 


1164.7 


87.5 


4062730 




CTY 


- 


- 


- 


(60M) 


(270) 


- 




CTY+ 


- 


- 


- 


(60M) 


(270) 


- 


Gp 


OWCTY 


3 


2 


13/1 


1167.3 


87.5 


4062730 


elevator4* 


EL 


1 


2 


3/1 


16192.4 


282 


13308496 




CTY 


1 


- 


- 


(365M) 


(278) 


- 




CTY+ 


1 


- 


- 


067M) 


(278) 


- 


Gp 


OWCTY 


1 


2 


5 / 1 


16388.0 


282 


13308496 



Table 4. LTL model checking on systems with mnltiple fairness constraints. 
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Exp. 


Proc. 


Num 

Eair 


Ext. 

Iter. 


EX 


Time 

(sec) 


A* 


EL 


2 


6 


203 


65.49 


OWCTY 


2 


2 


77 


32.58 


D* 


EL 


6 


2 


147 


16.26 


OWCTY 


6 


2 


149 


16.33 


E* 


EL 


4 


3 


125 


6.89 


OWCTY 


4 


2 


87 


6.39 


P + 


EL 


2 


10 


50 


870.0 


OWCTY 


2 


2 


27 


897.7 


HI* 


EL 


2 


8 


40 


633.8 


OWCTY 


2 


2 


23 


495.7 


H3* 


EL 


2 


8 


40 


550.5 


OWCTY 


2 


2 


23 


592.7 



Exp. 


Proc. 


Num 

Fair 


Ext. 

Iter. 


EX 


Time 

(sec) 


R 


EL 


2 


2 


40 


1004.5 


OWCTY 


2 


2 


23 


692.9 


ji* 


EL 


2 


8 


40 


521.9 


OWCTY 


2 


2 


23 


426.6 


J2* 


EL 


2 


8 


40 


447.9 


OWCTY 


2 


2 


23 


347.7 


K* 


EL 


2 


7 


25 


220.3 


OWCTY 


2 


2 


20 


165.3 


L* 


EL 


2 


6 


24 


129.4 


OWCTY 


2 


2 


19 


129.4 


Ml* 


EL 


2 


7 


35 


81.5 


OWCTY 


2 


2 


21 


53.9 



Table 5. Results from Intel on checking EGfairtrue on systems that have (and 
require) multiple fairness constraints. 



Finally, we compared OWCTY and EL on Intel designs using internal Intel tools 
(Table 5). All the table entries reflect the composition of actual designs with 
linear-time properties, using multiple fairness constraints. OWCTY performed 
significantly better than EL in all examples except F and H3, where EL slightly 
outperformed OWCTY. 

4 OWCTY Versus Specialized Algorithms 

Our experimental results show that OWCTY generally outperforms EL on terminal 
and weak systems. Bloem, Ravi, and Somenzi have presented an algorithm that 
is specialized to verify terminal and weak systems efficiently [2]. Linear-time 
model checkers detect bad cycles by using the EL algorithm to check EG true 
over the product of the design and the negation of the desired property. Bloem 
et al. observed that for terminal and weak systems, CTL formulas capture the 
search for bad cycles. Specifically, the formulas EF fair and EF EG fair are true of 
terminal and weak systems, respectively, when they contain infinite fair cycles. 
Accordingly, their algorithm (henceforth BRs) checks one of the formulas EF 
fair, EF EG fair, or EGfairtrue based on the structure of the input system. This 
structure follows from the structure of the property being tested: if a property 
corresponds to a weak (resp. terminal) system, the product of that property and 
a design model is also a weak (resp. terminal) system. Bloem et aL showed that 
BRS significantly outperforms EL in practice on terminal and weak systems. 

Table 6 compares OWCTY to BRS.^ For the examples from Table 2, we checked 
both EGfairtrue and the appropriate formula from BRS using OWCTY. The statis- 

^ The gcd and fpmult examples are the same as Bloem et aL used in their paper [2]. Our 
resource usage on these examples differs widely from theirs due to differences between 
our two versions of the compiler from Verilog to BLIF, the VIS input language. 
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Experiment 


Procedure 


EX 


Time 

(sec) 


Mem 

(MB) 


peak 

BDD nodes 


ethernet 1 
G(p Fg) 


-EF EG fair 


53 


4.2 


11.2 


151306 


EGfairtrue(owcTY) 


57 


5.5 


11.7 


175118 


ethernet 2 
G(p Fg) 


-EF EG fair 


109 


24.4 


13.7 


381839 


EGfairtrue(owcTY) 


113 


59.6 


14.1 


404723 


ethernet 3 
G(p Fg) 


-EF EG fair 


173 


13.3 


13.6 


287787 


EGfairtrue(owcTY) 


177 


24.6 


13.8 


290593 


ethernet 4 
G(p Fg) 


-EF EG fair 


241 


145.6 


14.0 


373531 


EGfairtrue(owcTY) 


245 


491.4 


14.1 


368225 


treearb 8* 
G(p Fg) 


-EF EG fair 


22 


4.1 


12.6 


200529 


EGfairtrue(owcTY) 


24 


4.2 


12.7 


206640 


gcd 

G{p ^ XFg) 


-EF EG fair 


20 


3351.6 


193 


8204281 


EGfairtrue(owcTY) 


24 


2497.5 


130.9 


6285856 


fpmult 

G{p^ XXXg) 


-EF fair 


8 


5565.5 


329 


16109729 


EGfairtrue(OWCTY) 


17 


22457.2 


369 


17422253 



Table 6. Comparison between the OWCTY and BRS algorithms. 



tics on EGfairtrue are reproduced from Table 2. The specialized approach outper- 
forms OWCTY on most of these examples (except the gcd example). This is due 
to the difference between checking EGtruefah (brs) and EGfairtrue (owcty). 
The former restricts the search for a bad cycle to the fair states; the latter looks 
for a cycle that intersects the fair states. As a result, both EL and OWCTY can 
have non-fair states in their approximation sets, while BRS’ approximation set 
contains only fair states. This restriction usually allows BRS to converge faster. 

This comparison demonstrates how exploiting structural information about 
systems can lead to more efficient verification algorithms. Note, however, that 
BRS is not a generic cycle-detection algorithm. Furthermore, we must also con- 
sider the cost of determining whether a system is weak or terminal, which is 
not included in our paper or in Bloem et aL^s. In theory, this operation can be 
done symbolically in 0{n\ogn) time [1], but experimental results are not yet 
available. For the simple properties considered by Bloem et al. and here, this 
overhead is insignificant; for more complicated properties (such as those includ- 
ing complex environmental assumptions) it could be rather substantial. OWCTY, 
which is a generic algorithm, performs well in practice without the overhead of 
specialized analyses as required in BRS. 



5 Conclusions 

Symbolic model checking remains a heuristic process, as metrics do not yet exist 
to predict BDD behavior under differing algorithms. As a result, comparative 
analyses of algorithms are extremely useful in helping tool developers choose 
which algorithms to implement. In the name of good science, these analyses need 
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to be reproducible and portable to the greatest extent possible. Such analyses 
provide not only firm data, but a foundation for future algorithm development. 

This paper compares three symbolic cycle-detection algorithms (and a variant 
on one of them) based on the number of iterations they take through their 
outermost fixpoint operator, as well as the number of image operations they 
perform. Each algorithm employs a slightly different strategy for pruning the 
set of states potentially lying on cycles. Our analysis shows that the original 
Emerson-Lei (el) algorithm [5] performs too little work outside of its internal 
iterations, while Hardin et aL^s Catch-Them-Young (cty) algorithm [8] performs 
too much. In contrast, HojatiA El2 algorithm [10], which we view as a one-way 
version of CTY (owcty) does seem to balance the work inside and outside the 
internal iterations. On random examples and on terminal and weak systems, 
OWCTY dominates EL, while on general systems, OWCTY is competitive with 
EL, dominating it significantly in many cases. We have also shown that the two 
algorithms are incomparable with respect to the number of image computations 
they perform: EL can have a quadratic overhead over OWCTY, while OWCTY can 
have a linear overhead over EL. These results support our conclusion that model 
checkers need to contain both EL and OWCTY. 

In the course of this project, we have identified two desired features for verifi- 
cation tools. Eirst, we want tools to implement multiple algorithms for common 
problems such as cycle-detection. Both our analysis and the recent one by Ravi 
et al, [16] indicate that no algorithm consistently outperforms the others; indeed, 
verification tasks may be tractable with one algorithm and intractable with an- 
other. Tools providing multiple algorithms afford human verifiers opportunities 
to experiment and find algorithms that work on their applications. A similar 
conclusion in the context of semi-exhaustive reachability analysis was reached 
in [6]. Second, we want tools to provide visualizations of computational patterns 
during model checking. InteEs Palette [12] does some of this; we wish we had 
such a tool to augment VIS and other publicly-available tools. Testbeds support- 
ing multiple algorithms and better data collection would provide strong support 
for more disciplined approaches to algorithm comparisons in verification. 
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Abstract. We propose a new validation algorithm for bounded Petri 
Nets. Our method combines state enumeration and structural techniques 
in order to compute under-approximations of the reachability set and 
graph of a net. The method is based on two heuristics that exploit prop- 
erties of T-semiflows to detect acyclic behaviors. T-semiflows also give us 
an heuristic estimation of the number of levels of the reachability graph 
we have to keep in memory during forward exploration. This property 
allows us to organize the space used to store the reachable markings as a 
circular array, reusing all markings outside a sliding window containing 
a hxed number of the last levels of the graph. We apply the method to 
examples taken from the literature [ABG"^95,GM97,MGG97]. Our algo- 
rithm returns exotet results in all the experiments. In some examples, the 
circular memory allow us to save up to 98% of memory space, and to 
scale up to 255 the number of tokens in the speciheation of the initial 
marking. 



1 Introduction 

Bounded Petri Nets (PNs) are finite-state concurrent systems in which the max- 
imal number of processes (tokens) in any possible state (place) is bounded by 
a constant. Though decidable, the verification of safety and liveness properties 
of bounded PNs is a very hard problem in practice. Following the literature 
in the field [STC98,Val98], the techniques used to attack this problem can be 
distinguished into the following classes. 

State Enumeration Techniques. The reachability graph of a finite-state system 
built using an exhaustive search algorithm [Hol88] is a complete tool for the 
verification of safety and liveness properties. This technique suffers from the 
state explosion problem, i.e., the explosion of the size of the reachability graph 
compared to the size of the specification [BCB"^90,Val98]. Partial search [Hol88] 
can be used as heuristics to validate large finite-state systems. In general, par- 
tial search returns under- approximations of the reachability graph. Therefore, 
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it cannot be used for verification purposes, but only for simulation and testing. 
When incorporated in search algorithms, efficient data structures like hash tables 
[H 0 I 88 ], BDDs [BCB+90], and Sharing Trees [GGZ95] represent other important 
heuristics to alleviate state explosion. 

Structural Technigues, While state enumeration is a general-purpose technique 
for validation of finite-state systems, verification techniques based on structural 
properties are a distinguishing feature of PNs [STC98]. These techniques work 
without explicitly computing the reachability graph. They rely on linear pro- 
gramming (synonymous of efficiency) usually returning approximated answers. 
For instance, the state eguation [Rei 86 ] can be used to over-approximate the 
reachability set of a PN, and thus to verify safety properties [STC98]. Other 
techniques like traps can be used to improve the precision of the state equation 
[EMOO]. 

Our Contribution, In contrast with traditional uses of structural theory, in this 
paper we investigate the combination of enumerative and structural techniques 
for validating and debugging systems modeled as bounded PNs. Specifically, we 
use structural properties as heuristics to guide the search during state explo- 
ration. In order to attack state explosion we incorporate our heuristics within a 
partial search algorithm, and we leave open the possibility of using efficient data 
structures for storing intermediate results. 

More precisely, the algorithm we propose explores part of the state-space of a 
PN using properties of minimal T-semiflows in order to detect acyclic occurrence 
sequences without having to search for visited markings. Minimal T-semiflows 
form a system of generators (the fundamental set) for all the positive integer 
solutions of the system of equalities 

C • ic = 0, C being the token flow matrix. 

To apply our heuristics, we require the fundamental set to be integral, i.e., T- 
semiflows must be non-negative integer combinations of minimal T-semiflows. 
This conditions is satished by several case-studies we have found in the liter- 
ature (see Section 5). Integrality is a new property we introduce on the basis 
of classical notions of linear programming [Sch94]. Our algorithm returns an 
under- approximation of the reachability graph, while automatically measuring 
the quality of the approximation. Specifically, a flag is raised whenever the re- 
turned graph is an eguivalent representation of the reachability graph. Thus, in 
an ideal situation our validation method can also be used as a complete tool 
for verification. At any moment during the execution, the algorithm works on a 
sliding window that covers the last levels of the partially constructed graph. The 
number of levels included in the sliding window is computed statically, using 
again minimal T-semiflows. This property gives us an estimation of the number 
of levels of the reachability graph we need to keep in memory during forward 
exploration. We exploit these information to build the following garbage collec- 
tion procedure: we organize the main memory as a circular array, and we re-use 
the memory allocated to all markings outside the window. 
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In order to test the appUcabiUty of our assumptions and the quality of our 
heuristics, we run a prototype implementation of the algorithm (without use of 
dedicate data structures to store the markings) on several examples taken from 
[ABC"^95,CM97,MCC97]. Our aim was to check safety properties and compute 
the reachability set. The preliminary results seem very promising. In some of the 
examples, we were able to scale up the number of tokens in the initial marking to 
255, and to handle the resulting PN using only 25Mbytes of main memory (within 
the range of the RAM memory of a personal computer, see Section 5). Without 
sliding window the same examples would have required approximatively 1,300 
Mbytes of memory, i.e., our heuristics can save up to 98% of memory space. 
Finally, we obtained an exact representation of the reachability set in all our 
experiments, i.e., with our method we were able to verify all safety properties 
taken into consideration. 

Plan of the paper. In Section 2, we recall the main properties of the Structural 
Theory of Petri Nets. In Section 3, we introduce the notions necessary to our 
algorithm. In Section 4, we present the heuristics and the validation algorithm. 
In Section 5, we discuss our experimental results. Finally, in Section 6 and 7 
we discussed related works and future directions of research, respectively. The 
extended version of this paper (containing the proofs of all results) is available 
as technical report [CDCOO]. 

2 Structural Theory for Petri Nets 

Following [STC98], a PN TV is a tuple (T’, T, Pre, Post, mo), where P is the 
finite set of places, T is the finite set of transitions, Pre and Post are the 
\P\ X \T\ sized, incidence matrices, and mo is the initial marking. The matrix 
C = Post — Pre is called token flow matrix. A marking m — (mi, . . . , m^) is a 
vector of natural numbers of dimension n = \P\. We will use 0 to denote the null 
vector (0 , . . . , 0) . Given two vectors m = (mi , . . . , nin ) and m' = (mi , . . . , m(^ ) , 
we define m > m' if and only if m^ > mj- for i : I, ... ,n. Similarly, we can define 
m = mf whereas m > m' holds if and only if m > m' and m m'. 

Occurrence sequences, and Parikh vectors. Let TV be a PN with token flow 
matrix C, n places, and m transitions ti, . . . ,tm. A transition t ^ T is enabled at 
marking m if m > Pre[P, t], i.e., there are enough tokens to fire t. The firing of 

the transition t, namely m m' , yields a new marking m' = m + C[P,t]. An 

occurrence seguence from m is a sequence of transitions a = si ... s^ such that 
m -A ... -A rrik. The reachability set is denoted by 7^(TV, mo). The reachability 
graph is denoted by ^(TV, mo). The state eguation is defined as the system of 
equalities 

m' — mo C • X, 

where m' and x are vectors of variables that range over positive integers. The 
Parikh vector p^ associated to a finite occurrence seguence a is defined as follows: 

Po = (Occt.O), . . . ,0cct^{(7)}, 
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where Occ^^ ((j)=number of occurrences of ti in a. In the following we will use 
. . . to denote vectors of natural numbers with dimension = \T\ (for clarity, 
always referred to as Parikh vectors) . 

T-semiflows^ and Fundamental Set, An integer vector x of dimension m is called 
a T-flow if and only if 

C • ic = 0, where C is the token flow matrix. 

The following proposition relates T-flows and cyclic sequences. 

Proposition 1 (Prom [STC98]). Let N be a PN^ and let m A mG Then^ 
the Parikh vector associated to a is a T-flow if and only if m — mL 

A T-semiflow is a T-flow x such that a? > 0. A minimal T-semiflow is a T- 
semiflow x such that: the greatest common divider of all its positive components 
IS equal to and there are no T-semiflow y such that the set of non-zero com- 
ponents of y are contained in that of x. The fundamental set of T-semiflows, 
say F, of N is the set of minimal T-semiflows of TV. The fundamental set can be 
computed using a variation of the Gaussian elimination method. The number of 
minimal T-semiflows of a PN TV could be exponential in the size of TV [STC98]. 
T-semiflows enjoy the following properties. 

Theorem 1 (From [STC98]). Let N be a PN with fundamental set T — 
{ici, . . . , Every T-semiflow y can be obtained as a non-negative linear com- 
bination with rational coefficients of the minimal T-semiflows, i,e,, y = ciXiP 
. . . -h CkXk, where Xi G F, Ci E Q, and Ci > 0 for i : 1, . . . , 

In the following we will call LinQ+ [F) {Lin’z+ {F)) the set of vectors obtained as 
non-negative linear combinations with rational (integer) coefficients of vectors 
in F . From Theorem 1 and Prop. 1, we obtain the following corollary. 

Corollary 1 (Cycle ^ T-semiflow [STC98]). Let N be a PN^ and let m 

mG If m — mf /.e., <7 is a cycle in ^(TV, m), then p^ E Tmq+(TF). 

The reverse implication might not hold. A counterexample of a PN in which a 
T-semiflow is not realizable (all paths denoted by the T-semiflow are not valid 
occurrence sequences) is given by Reisig in [Rei86]. Note that for Free-choice 
PNs [DE95] minimal T-semiflows are always realizable. Unfortunately, this class 
does not permit to model interesting examples of mutual-exclusion algorithms. 



3 Towards the Combination with State Enumeration 

Our starting point consists in a reformulation of the standard exhaustive search 
algorithm using Parikh vectors. The unique goal of this preliminary step is to 
simplify the integration of our structura/ heuristics in the enumerative approach. 
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Algorithm ES(V, T') : Boolean. 

N = (P, T, Pre, Post, mo) : PN; 

V: the sot/ety property; 

Old:=0; 

New:={mo}; 
while (New nonempty) do 
m = element from New; 
if not(P(m)) then return(false); 
for every t* G T enabled at m do 
m' = m + C[P, t*]; 
if (m* 0 Old U New) then 
add m* to New; 

endf; 

add m to Old; 
delete m from New; 
endw; 

return(true). 



Algorithm DES(7V, P) : Boolean. 

N = (P, T, Pre, Post, mo) : PN; 

P: the sot/ety property; 

yo := 0; 

Old := 0; 

New := {yo}; 
while (New nonempty) do 
y = element from New; 
if not(P(M (?/))) then return(false); 
for every ti G T enabled at M{y) do 

y' = y[yz ■= Vt + i]; 

if {M{y^} 0 M (Old U New)) then 
add y^ to New; 

endf; 

add y to Old; 
delete y from New; 
endw; 

return(true). 



Fig. 1. Two formulations of the Type 1 Reachability Algorithm for PNs. 



3.1 An Encoding Based on Parikh Vectors 

Let TV be a PN with n places, m transitions, token flow matrix C, and initial 
marking mo- Following [Hol88], the exhaustive search procedure ES {exhaustive 
search) of Fig. 1 builds the complete reachability set (graph) storing the set of 
visited markings in the variable Old. The procedure ES can be reformulated 
using a representation of a reachable marking m via the Parikh vector asso- 
ciated to the path a such that mo A m. In fact, from the state equation we 
know that 

m = mo C - Pa. 

A Parikh vector x can be used as a concise representation for all realizable paths 
a starting from mo such that pa — x. Given a Parikh vector y we define the 
marking M (y) associated to y as 

M(y) = mo + C • y. 

Note that M{yo) = mo whenever yo = 0. Furthermore, given a set of Parikh 
vectors S we define 

M{S) = {m I m = M{y), y G S}. 

Using the mapping TVL(-), we can reformulate the forward reachability algorithm 
representing explicitly the Parikh vectors underlying every marking, as shown in 
the dual exhaustive search procedure DES {dual exhaustive search) of Fig. 1. In 
the algorithm DES (the skeleton of ES), firing a transition ti enabled at M{y) 
modifies a vector y — (yi, . . . , y^^) as follows 

y[yi ■= Vi + 1] = (j/i, ■ ■ -.j/i-i.j/! + ■ ■■,yn)- 
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Suppose we run the two algorithms of Fig. 1 in parallel, then the following prop- 
erties hold at any step: m' = M(y'), Old = M(Old), and New = M(New). 
From now on, we will use the algorithm DES as a platform to include the set of 
heuristics based on properties of T-semiflows described in the following section. 



3.2 Sufficient Conditions for Detecting Acyclic Behaviors 

As shown next, the contraposed form of Cor. 1 of Section 2 can be used to devise 
sufficient conditions for detecting acyclic occurrence sequences without having 
to search for visited markings. 

Corollary 2 (Not-T-semifiow ^ Not-Cycle). Let N be a PN^ and let m 

mC If Pa ^ LinQ+{P) then a is a not a cycle in Q{N^ra), 

This property goes well together with our formulation of the forward reachabil- 
ity algorithm using Parikh vectors. Before entering in more details, let us first 
analyze the cost needed to check the condition pa ^ Tmq+ {T) of Cor. 2. To test 
this condition, we must solve a linear problem with rational solutions (polyno- 
mial in the size of P). Are there more efficient sufficient conditions (e.g. linear 
in P) we can use? To answer this question, let us introduce the following new 
notion. 

Definition 1 (Integral Fundamental Set). We say that the fundamental 
set P IS integral whenever x E LinQ+[P) implies that x E Linz+{P), i.e, 
all T-semiflows can be computed using non-negative combinations with integer 
coefficients. 

Under the assumption that P is integral, the following theorem can be used as 
a sufficient condition for detecting acyclic behaviors. 

Theorem 2 (Sufficient Condition for Not-T-semifiow). Let N be a PN 

with m transitions^ and integral fundamental set P — {a?i, . . . , where Xi — 
{xi^i, . . . , Xi^rn)^ Furthermore^ let m mf and Pa — {vi Vm) the Parikh 
vector associated to a. If for all i : ... ffi there exists j E {b • • • , xn} such that 

Xij > yj ^ then for all non-empty subpath P of a ^ ^ Tinq+(jF). 

The cost of checking the condition of Theorem 2 is linear in the cardinality of 
P . The cardinality of P is potentially exponential in the size of TV, but it is often 
linear in practice (see Section 5). As a remark, note the difference between the 
hypotheses of Theorem 2, and those of Prop 1, namely G • Pa 7 ^ 0. If Theorem 
2 holds, then all subpaths contained in the path a from m to m' are acyclic. 
Contrary, if C • 7 ^ 0, then we deduce that only the paths from m to m' 

are acyclic. However, it is easy to build a Petri Net for which there exist three 
markings m,m' and m'' such that m mf C • pa^-\-a 2 7 ^ 0, and 

C • P(j2 — 0- 
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3.3 Checking the Integrality of JF 

To check the integrality of the fundamental set, we can use the notion of total 
ummodularity [Sch94]. A matrix A with integer coefficients is totally unimodular 
if every subdeterminant of A is 0, 1 or —1. From [Sch94], we know that if A is 
totally unimodular, then the extreme points of the set of solutions of A-x = b are 
integer numbers for any vector b. Furthermore, to check the total unimodularity 
of F, we can use the following (polynomial-time) criterion on the matrix with 
minimal T-semiflows as rows. 

Theorem 3 (Prom [Sch94]). Let A be matrix with two non-zero coefficient 
in each column, A is totally unimodular iff its rows can be split into two classes 
such that for each column: if the nonzero in the column have the same sign then 
they are in different classes^ and if they have opposite signs they are both in the 
same class. 

Thus, if F forms a totally unimodular matrix, then x G LinQ+ (F) if and only if 
X G Lin'z+{L’)- Perhaps surprisingly, several examples taken from the literature 
satisfy the integrality requirement on F. We will turn back to this point in 
Section 5. 

4 Partial Search with Structural Heuristics 

We come now to the definition of our partial search algorithm. Basically, the 
idea is to replace the core of the reachability algorithm DES of Fig. 1 with two 
heuristics selected on the basis of a preliminary comparison of Parikh vectors 
with minimal T-semiflows. The first heuristics exploits Theorem 2 to add mark- 
ings to the set of visited states. The second heuristics applies sufficient conditions 
to localize the search for back-edges in the reachability graph. A Boolean flag (we 
will call complete) is used to estimate the guality of the approximation com- 
puted by the heuristics. The resulting partial search PS algorithm is shown in 
Fig. 2. To explain it in detail, in the rest of the section we will use the predicate 
SFC defined as 

SFC{y) = for all x = (a?i, . . . , G F exists i G {i, . . . , m} s.t. yi < Xi, 

to denote the comparison between a Parikh vector y = {yi,...,ym) and the 
minimal T-semiflows of F. Now, let y' be the new Parikh vector generated 
during the execution of forward reachability, and let Old and New denote the 
set of visited markings. 

The First Structural Heuristics, Suppose SFC(y^) holds. From Theorem 2 and 
Cor. 2, we can deduce that the marking M{y') is not present in all paths cr. 
Pa — yf going from mo to M{y'), Under this hypothesis, our heuristics is 
defined as follows: without further checks on Old we instruct the algorithm 
to immediately add y' to New. The advantage of the heuristics is that we 
avoid the cost of searching for (a possible occurrence of) M{y') in the whole 
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graph. The drawback is that it could introduce redundant markings. In fact, 
the marking M{y^) may occur in paths unrelated to (not captured by Cor. 
2). This fact does not influence the termination of the resulting algorithm, as 
stated in Theorem 4. We postpone the practical evaluation of the first heuristics 
to Section 5. 

The Second Structural Heuristics. Suppose that SFC(y^) does not hold. Then, 
there exists some x ^ T such that y' > a?. In other words, all paths cr such that 
Pa — y' contain a subpath that is a minimal T-semiflow. Furthermore, since by 
definition C • ic = 0, if we apply the state equation we obtain that 

M(y') = mo-\-C • {y' - x). 

Our idea is to use the normalized Parikh vector y' — x to guide the search for 
a marking m G Old such that m — M{y'). Formally, let the rank of a Parikh 
vector y be dehned as 



rank{{yi,...,yn)) = + ... + yn. 

Furthermore, given a set of Parikh vectors S, let the k-th level of S be defined 
as 

S[k] = {y I y G S, rank{y) = k}. 

Then, if SFC(y') = false, we first search for a marking m such that m — M(y') 
in all levels OLD[rank{y' — a?)] with x ^ T and y' > ic. If we find the node we 
draw a back-edge. The edge will be part of a cycle. If the previous local search 
fails, we discharge the vector y', while setting the Boolean flag complete to 
false. This way, we inform the user that the algorithm is computing an under- 
approximation of the reachability graph. Basically, we substitute the full termi- 
nation test y G Old of the algorithm DES of Fig. 1 with a sufficient condition. 
If the flag complete is true when the algorithm terminates the exploration of 
the state space, then the resulting reachability graph is exact. The following 
theorem formalizes these properties. 

Theorem 4. Let N be a bounded PN with integral fundamental set T , V a 
safety property and let C be the value of the flag complete when the algorithm 
PS of Fig. 2 returns. Then, (1) the computation of PS{N,V) always terminates 
(returning true or false); (2) if PS{N,V) = true and C = true, then V holds 
for N; (3) if PS[N ,V) — false, then V does not hold for N. 

The second heuristics gives us a bound on the number of levels we have to keep 
in memory during the exploration of the reachability graph. The bound WS 
(window size) is the maximum between the ranks of the minimal T-semiflows in 
T , namely 

WS = max{ rank{x) \ x ^ T ) . 

Thus, our algorithm works only on a window of dimension WS that covers the 
last levels of the current reachability set. We will present a memory management 
based on this property in the next section. 
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Algorithm PS(V, T'): Boolean 

N = (P, T, Pre, Post, mo); 

T\ integral fundamental set of N; 

V: the sot/ety property; 

yo := 0; complete:=true; 

New := { 2 / 0 }; Old := 0; 
while (New nonempty) do 
y = element from New; 
if not(P(M (?/))) then return(false); 
for every ti G T enabled at M{y) do 

y' = y[yi ■■= + i]; 

if for all a? G P exist i G {1, . . . , m} s.t. y[ — a;* < 0 then 
if {M{y') is not in M(New)) then add y* to New; 
else if [M[y*) is not in M{0\d[rank{y^ — a?)]) for some x £ y* >x) then 
complete:=false; 
else add the back-edge; 
endfor; 

add y to Old; delete y from New; 
endw; 

if complete write(’Exact RS’) else write(’Approximated RS’); 
return(true). 



Fig. 2. A Type 2 Reachability Algorithm. 



5 Experimental Results 

We have implemented a prototype version of the algorithm DFR of Fig. 2, 
borrowing the graphical interface and the library for compnting strnctnral prop- 
erties from GreatSPN [CFGR95], and nsing the following specialized memory 
management. 



5.1 Organizing the Memory as a Circular Array 

We consider PNs where transitions can be fired at most 255 times. We orga- 
nize the available memory (RAM -h swap area) as a circular array^ where each 
slot in the array contains m bytes and stores a Parikh vector (m=nnmber of 
transitions). Onr representation does not depend on the bonnd on the nnmber 
of tokens in the places. If TM is the size of allocated memory in bytes, the 
nnmber of available slots AS is then AS = TMjm. Fnrthermore, if NS is the 
nnmber or reachable vectors, then the virtual memory reqnired to store them is 
MS — NS * m bytes. A table maintains the initial and final address of the set 
of Parikh vectors of each level. Each level is stored as an ordered list. A sliding 
window covering the last WS levels of the reachability graph moves aronnd the 
circnlar array (we defined WS in the previons section). The global size of the 
sliding window is the snm of the nnmber of states in each of its levels. By con- 
strnction of PS, we can always rense the states ontside the window in snccessive 
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CASE-STUDY 


T 


P 


SF 


FT 


WS 


1? 


Kanban [GM97] 


16 


16 


5 


0.01s 


8 


V 


Flexible Manufacturing System (FMS) [GM97] 


20 


22 


4 


0.04s 


13 


V 


Multipoll [MCC97] 


21 


18 


8 


0.06s 


5 




Gentral Server Model (GSM)[ABG'*'95] Fig. 76 pp. 154 


13 


14 


4 


0.03s 


5 




Readers- Writers [ABG'*'95] Fig. 11 pp. 17 


7 


~Y 


2 


0.02s 


4 




2x2 Mesh [ABG'*'95] Fig. 130 pp. 256 


32 


32 


8 


0.07s 


5 





Fig. 3. Profile of the case-studies: T=number of transitions; P=nnmber of places; 
SF(size of jF)=nnmber of minimal T-semiflows; ET=CPU execntion time to 
compnte T nsing GSPN on a Pentinm 133Mhz; WS=size of the sliding window; 
I?=is the fundamental set integral? 



iterations. An overflow exception OF is raised as soon as the algorithm adds a 
slot of the last level of the window to its hrst level (i.e. the window covers all 
memory). NR will indicates the number of times the last slot of the sliding win- 
dow goes beyond the rightmost limit of the array [NR = 0 means MS < TM). 
Finally, the ratio R defined as 1 — TMjMS give us an estimation of the saving 
of memory occupancy we obtain with our heuristics. 



5.2 Practical Evaluation 

At this stage of our work, the purposes of the experiments were: (1) testing the 
applicability of the assumptions under which the algorithm works (the existence 
of an integral fundamental set); (2) testing the quality and efficiency of our 
heuristics; (3) testing the scalability oi the specialized memory management. 

Applicability. To make the tests more interesting, we considered models of con- 
current and productions systems taken from [ABC"^95,CM97,MCC97]. Further- 
more, in order to study the scalability of our approach we restricted ourselves to 
consider systems with parametric initial markings, where the parameter is the 
number of initial tokens in some given places of the net. For these examples, we 
were interested in computing the set of reachable states, so as to prove safety 
properties like mutual exclusion. As shown in Fig. 3, most of the examples in 
[ABC"^95,CM97,MCC97] with the previous characteristics turned out to have 
fundamental set. We computed T using the structural library of GSPN 
within negligible execution times (see again Fig. 3). We remark that only the 
Kanban system of [CM97] is a free choice net, all the other examples heavily 
rely on the use of semaphores. 

Quality and Efficiency. In order to test the quality and efficiency of our heuris- 
tics, we compared the execution times of our prototype with those of GreatSPN 
[CFGR95], one of the more efficient tools for the generation of the reachability 
graph of a PN. We performed all experiments on a Pentium with a clock speed 
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CASE-STUDY 


NT 


ET-Prot 


NS-Prot 


CF 


ET-GSPN 


NS-GSPN 


Kanban 


2 


1.530s 


4,600 


true 


0.860s 


4600 


4 


229.070s 


454,475 


true 


158.700s 


454,475 


5 


1464.270s 


2,546,432 


true 


V 


V 


6 


V 


y 









FMS 


2 


1.270s 


3,444 


true 


0.460s 


3,444 


4 


249.170s 


438,600 


true 


117.770s 


438,600 


5 


y 


y 





V 


V 


Multipoll 


2 


5.210s 


11,328 


true 


2.190s 


11,328 


4 


56.280s 


106,280 


true 


27.030s 


106,280 


9 


1164.750s 


1,943,160 


true 


V 


V 


10 


V 


V 









Mesh 


2 


178.870s 


200,544 


true 


46.150s 


200,544 


3 


V 


V 





V 


V 


CSM 


2 


0.020s 


76 


true 


0.010s 


76 


32 


23.180s 


95,876 


true 


27.920s 


95,876 


75 


311.530s 


1,170,704 


true 


538.450s 


1,170,704 


115 


1156.240s 


4,162,544 


true 


V 


V 


116 


V 


V 









Reader- Writers 


4 


0.030s 


90 


true 


0.010s 


90 


32 


7.170s 


64,889 


true 


10.250s 


64,889 


62 


94.350s 


762,384 


true 


175.300s 


762,384 


114 


1069.020s 


7,927,295 


true 


V 


V 


115 


V 


V 


- 







NT=Number of Tokens in the initial marking; 
ET=CPU Execution Time on a Pentium 133Mhz; 
NS=Number of reachable markings; 

CE= value of the Complete Elag when PS returns; 
^=memory overflow; 

-Prot=executed on our prototype; 
-GSPN=executed on GreatSPN [CEGR95]. 

Fig. 4. First serie of experimental evaluations. 



of 133Mhz, RAM memory of 32Mbytes, and swap area of 34Mbytes, allocating 
a priori 55Mbytes of memory to store the reachability set. The table in Fig. 4 
summarizes the results of a first serie of experiments. Surprisingly, the algorithm 
returned an exact representation (without redundancies) of the reachability set 
in all the examples (and different values for the parameter=number of tokens 
in the initial marking). In all the experiments of Fig. 4 we never had to exploit 
the circularity of our memory organization: 55Mbytes where enough to store the 
reachability set. The cost of our heuristics and of the localized search turned out 
to be comparable to that of the efficient search of GreatSPN (despite the fact 
that GreatSPN makes also use of simplification rules). However, on examples like 
Reader- Writers GreatSPN was not able to compute the reachability graph for 
nets with more than 62 tokens in the initial marking (as indicated by the over- 
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Readers- Writers (No. trans. m= 7) executed on our prototype | 


NT 


TM 


NS 


MS 


NR 


R 


FT 


GF 


OF 


255 


45 Mb 


185,977,536 


1,302 Mb 


28 


96% 


27,981s 


true 




255 


35 Mb 


185,977,536 


1,302 Mb 


37 


97% 


27,996s 


true 




255 


25 Mb 


185,977,536 


1,302 Mb 


52 


98% 


27,991s 


true 




255 


21 Mb 


66,252,650 


463 Mb 


22 


95% 


9,719s 


true 


V 


128 


45 Mb 


12,440,544 


87.1 Mb 


1 


48% 


1,723s 


true 




128 


35 Mb 


12,440,544 


87.1 Mb 


2 


60% 


1,722s 


true 




128 


25 Mb 


12,440,544 


87.1 Mb 


3 


71% 


1,721s 


true 




128 


15 Mb 


12,440,544 


87.1 Mb 


5 


82% 


1,722s 


true 




128 


5 Mb 


12,440,544 


87.1 Mb 


17 


94% 


1,723s 


true 




128 


3 Mb 


5,631,404 


40 Mb 


13 


92% 


766.6s 


true 


V 


64 


1 Mb 


860,145 


6 Mb 


6 


83% 


108.3s 


true 




64 


500 Kb 


860,145 


6 Mb 


12 


91% 


108.5s 


true 




64 


250 Kb 


169,728 


1.2 Mb 


4 


25% 


19.6s 


true 


V 


32 


300 Kb 


64,889 


455 Kb 


1 


34% 


7.33s 


true 




32 


75 Kb 


64,889 


455 Kb 


6 


83% 


7.38s 


true 




32 


50 Kb 


23,099 


162 Kb 


3 


69% 


2.38s 


true 





NT=number of tokens in the initial marking; 
TM=total allocated memory; 

NS=number of reachable states; 

MS=NS*m; 

NR=number of rounds in the circular memory; 
R=1-(TM/MS) (saving ratio in pet); 

ET=GPU execution time on a Pentium, 133Mhz; 
GF=complete flag; 

OF=overflow flag. 



Fig. 5. Second serie of experimental evalnations. 



flow flag . This fact is dne to the overhead of a more sophisticated encoding of 
markings and to the organization of visited markings as a tree strnctnre [Chi89] 
(trade-off between efficient search operations and memory reqnirements) . Both 
onr prototype and Great SPN store the edges of the reachability graph on disk. 

Scalability. In order to test the scalability of onr method, we performed a second 
serie of experiments in which we snccessively redneed the qnantity of memory 
allocated for storing the reachability set. The aim was to test the efficacy of 
the circnlar implementation of the memory. The resnlts were qnite snrprising. 
For instance, as shown in Fig. 5 we were able to scale np to 255 the nnmber of 
tokens in the initial marking of Readers- Writers. In this case the net has approx- 
imatively 185 millions of reachable states. It wonld take approximatively 1300 
Mbytes of memory to store the entire reachability set. With onr henristics, we 
were able to rnn the example nsing only 25 Mbytes of memory, hence saving 98% 
of memory space. The memory manager retnrned an overflow exception (indi- 
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cated again with when we tried to nse 21Mb of memory. Fnrthermore, for an 
initial marking with 128, 64, and 32 tokens we were able to compnte the reacha- 
bility set saving (approximatively) 94% (TM=5Mbytes), 92% (TM=0.5Mbytes), 
and 84% (TM=75Kb) of memory space, respectively. We obtained similar resnlts 
for the CSM example. The resnlts on the other examples were less appealing, 
thongh we also managed to scale np FMS to an initial marking with 5 tokens. 
However, we believe that more resnlts will be obtained by nsing efficient data 
strnctnres to store sets of markings. 

6 Related Works 

As mentioned in the introdnction, strnctnral techniqnes are traditionally nsed to 
compnte over-approximations of the reachability set, see e.g. [STC98]. In [EMOO], 
traps are nsed to improve the qnality of the approximation. Place invariants can 
also be used to over-approximate the reachability set. Place invariants are the 
dual notion of T-semiflows, i.e., the solution of the system y C = 0. Let P be the 
matrix of minimal P-semiflows. As shown in [STC98], the solution of the equa- 
tion P • ic = P • mo over-approximates the set of solutions of the state equation 
m — mo -h C • (j, i.e., over-approximates the reachability set. Contrary, in our 
approach we have used T-semiflows to find under-approximations (useful for de- 
bugging) and to derive conditions to establish the quality of the approximation. 
Furthermore, differently from [EM00,STC98], our approach is incorporated in 
state enumeration. We are not aware of other approaches where T-semiflows are 
used for under-approximating the reachability set. In [MC99], Miner and Cia- 
rdo use MDDs (Multi-valued Decision Diagrams) to store the reachability set; 
whereas. Pastor et al. [PCP99] use P-invariants (semiflows) to improve a BDD- 
based encoding of the reachability set. Other compact data structures (like Shar- 
ing Trees) are tested on reachability problems of bounded PNs in [GGZ95,ST99]. 
As mentioned in the introduction, our heuristics could be incorporated, e.g., in 
a BDD-based framework. Our use of heuristics shares some similarities with 
depth-first search algorithms [Hol88,Val98] for state enumeration, an approach 
used to compute an under-approximation of the reachability graph. In fact, our 
heuristics gives us conditions to detect acyclic paths of the reachability graph 
that go from the initial marking to the current marking. However, note that the 
use of Parikh vectors allows us to check the absence of cycles on collections of 
paths (all paths and related subpaths represented by the vector). Furthermore, 
the use of the second heuristics allows us to obtain more accurate information 
w.r.t. a generic depth-first search where only the current paths is memoized. 
Depth-first search algorithms combined with methods for storing visited mark- 
ings have been proposed in [JJ91,MK96]. As heuristics for garbage collection, in 
[JJ91] Jard and Jeron propose to discharge states selected randomly from the set 
of visited markings, whereas Miller and Katz in [MK96] select the states to dis- 
charge using their revisiting degree. Differently from [JJ91,MK96], our method is 
based on a breadth-first search, in which we use the rank of minimal T-semiflows 
(i.e., heuristics peculiar of Petri Nets) to guide garbage collection. 
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7 Conclusions 

We have presented a new algorithm for validating concnrrent systems modeled 
as bonnded Petri Nets. Onr method is combines forward state exploration with 
two strnctnral henristics based on the properties of T-semiflows. One of the main 
featnre of onr henristics is that they give ns an estimation on the nnmber of levels 
of the reachability graph we need to keep in memory. Using this measnre, we can 
organize main memory as a circular array^ so as to garbage collect states ontside 
the cnrrent working window. In onr prototype, information for compnting error 
traces are stored on disk. In this preliminary work, we were mainly interested in 
evalnating the applicability of the method (are there interesting examples that 
fnlfill onr assnmptions?), and the efficacy of the specialized memory manage- 
ment (can we save memory?). In this respects, we think that onr resnlts are 
qnite promising (see Section 5). For a better evalnation of the approach (e.g. 
to compare its scalability w.r.t. BDD-based approaches like [MC99,PCP99]) , we 
plan to integrate efficient data strnctnres within onr preliminary naive imple- 
mentation of the algorithm (in which vectors are stored as seqnences of slots, as 
described in Section 5). Finally, it would be interesting to study the applicability 
of similar techniques for the validation of infinite-state systems, e.g., integrated 
in approaches like [DROO]. 

Acknowledgements. The authors would like to thank Javier Esparza for having 
pointed out to us several important references, Jean-Francois Raskin for fruitful 
discussions, and the anonymous reviewers for useful suggestions and the pointers 
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Abstract. We present a state space exploration method for on-the-fly 
verification. The method is aimed at systems for which it is possible 
to define a measure of progress based on the states of the system. The 
measure of progress makes it possible to delete certain states on-the-fiy 
during state space generation, since these states can never be reached 
again. This in turn reduces the memory used for state space storage 
during the task of verification. Examples of progress measures are se- 
quence numbers in communication protocols and time in certain models 
with time. We illustrate the application of the method on a number of 
Coloured Petri Net models, and give a first evaluation of its practical- 
ity by means of an implementation based on the Design/CPN state 
space tool. Our experiments show significant reductions in both space 
and time used during state space exploration. The method is not spe- 
cific to Coloured Petri Nets but applicable to a wide range of modelling 
languages. 



1 Introduction 

State space exploration has proven to be powerful for investigating the correct- 
ness of concurrent systems. The basic idea behind state space exploration is to 
construct a directed graph, called the state space ^ in which the nodes correspond 
to the set of reachable states of the system, and the arcs correspond to the state 
changes. Such a state space represents all possible executions of the system, and 
can be used to algorithmically verify and analyse an abundance of properties 
about the system under consideration. 

The main disadvantage of using state spaces is the state explosion problem: 
even relatively small systems may have an astronomical number of reachable 
states, and this is a serious limitation to the use of state space methods in the 
analysis of real-life systems. This has led to the development of many different 
reduction methods for alleviating the state explosion problem. Examples of re- 
duction methods are partial order reduction methods [19,21,22], the symmetry 
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method [5,6,14], and the unfolding method [7,17]. Reduction methods represent 
the full state space in a compact or condensed form, or represent only a sub- 
set of the full state space. The reduction is done such that the answer to the 
verification questions can still be determined from the reduced state space. 

Reduction methods typically exploit certain characteristics of the system, 
and hence work well for systems possessing these characteristics, but fail to work 
well for systems which do not have these characteristics. An example of this is 
the symmetry method which exploits the symmetry present in many concurrent 
systems, but fails to work on systems which do not possess symmetry. This paper 
presents a sweep-line method for state space exploration. The method is aimed 
at systems for which it is possible to define a measure of progress based on the 
states of the system. Examples of such progress measures are sequence numbers 
in communication protocols and time in certain systems with time. The key 
property of a progress measure is that for a given state 5, all states reachable 
from s have a progress measure which is greater than or equal to the progress 
measure of s. The progress measure will often be specific for the system under 
consideration. However, for some modelling languages a progress measure can 
be defined based on only the modelling language itself. The progress measure 
then applies to all models of systems constructed in that modelling language. 
Time in Coloured Petri Nets [13, 16] is an example of such a progress measure. 

A progress measure makes it possible to delete certain states on-the-fiy during 
state space generation, since it ensures that these states can never be reached 
again. Since we only delete states that can never be reached again, we never 
risk processing the same state twice. The sweep-line method therefore ensures 
that the generation terminates for finite-state systems after having visited all 
reachable states. Intuitively, the idea is to drag a sweep-line through the full 
state space, calculate the reachable states in front of the sweep-line and delete 
states behind the sweep-line. 

The sweep-line method makes it possible to investigate a number of inter- 
esting properties of systems, such as deadlocks, reachability, and safety proper- 
ties. Practical experiments with a prototype implementation of the sweep-line 
method show significant savings in both time and space for finite state systems. 
For infinite-state systems the sweep-line method can be used to explore and 
analyse larger prefixes of the state space than can be done with ordinary state 
space exploration. 

Deleting and/or throwing state information away on-the-fiy during state 
space generation, is also the underlying idea of the hit-state hashing method 
[10,11] and the state spaee eaehing method [9,12]. The basic idea of state space 
caching is to make a depth-first generation of the state space, keep the states of 
the depth-first search stack in memory, and allow for deletion of states during 
the state space generation which are not on the depth-first search stack. An ad- 
vantage of the sweep-line method compared to state space caching is that states 
are only generated and processed once. With the state space caching method, 
the same state may be regenerated and processed several times leading to an 
increase in run-time. As shown in [8] this run-time penalty can be fought against 
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by combining state space caching and partial order methods. The bit-state hash- 
ing method always keeps the states of the depth-first search stack in memory, 
but reduces (in its simplest form) the information stored about a single state 
to a hash value (index value). This hash value is then stored using a single bit 
in a bit-vector. A main difference between bit-state hashing and the sweep-line 
method is that with the sweep-line method full coverage of the state space is 
guaranteed. This is not the case with bit-state hashing, since two different states 
may be mapped to the same hash value due to hash collisions. 

The idea of using a system specific progress measure to improve state space 
analysis is due to Kurt Jensen. The detailed realisation presented in this paper is 
the responsibility and work of the authors alone. The idea of utilising a measure 
of progress to delete states during on-the-fly verification is intriguingly simple, 
but to the best of our knowledge it has never been documented before. 

The rest of this paper is organised as follows. Section 2 gives an informal 
introduction to the sweep-line method using a small Coloured Petri Net model 
as an example. Section 3 formalises the concept of progress measure. Section 4 
gives the generic state space construction algorithm for the sweep-line method. 
Section 5 shows how this generic algorithm can be tailored for on-the-fly veri- 
fication. Section 6 contains additional examples of progress measures and gives 
some numerical data on the performance of the sweep-line method using these 
examples. Section 7 contains the conclusions and a further discussion of related 
work. 

2 The Sweep-Line Method 

In this section we informally introduce the sweep-line method using a small 
example of a sliding window communication protocol [1]. The protocol makes 
efficient use of the network by sending a certain number of packets, called the 
window size, before it waits for an acknowledgement from the receiver. The pack- 
ets transmitted on the network include a cyclic sequence number used to order 
the packets received from the network correctly at the receiver. For efficiency 
reasons the number of bits reserved for this counter must be kept at a minimum, 
but reserving too few bits can result in erroneous acceptance of packets by the 
receiver. We assume that the network allows both overtaking and loss of packets. 
The purpose of the analysis is to investigate whether the design of the protocol 
is sufficiently robust to work correctly under these conditions. 

We have modelled the sliding window protocol using Coloured Petri Nets 
(CP-nets or CPNs) [13,16], but the sweep-line method is not specific to this 
formalism. Our aim here is not to explain the CPN model of the sliding window 
protocol in great detail, but just to use it for introducing the basic ideas of the 
sweep-line method. 

Figure 1 gives an overview of the CPN model of the sliding window protocol. 
It consists of three modules corresponding to a Sender, a Network, and a Receiver. 
In the CPN model we have supplemented the cyclic sequence number with a non- 
cyclic counter specifying the generation of the cyclic counter. This allows us to 
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detect if the receiver ever is in a situation where a faulty acceptance of a packet 
occurs. 




NetPacket NetPacket 



Fig. 1. Module overview of the sliding window protocol. 



The left-hand side of Fig. 2 shows the sender module which has a single 
counter (NextSend) specifying the sequence number of the next packet to be sent. 
The idea of the window mechanism is that it allows the sender to send a number 
of packets without receiving an acknowledgement. Whenever the sender receives 
an acknowledgement from the receiver, the window is updated accordingly. 





Fig. 2. Sender module (left) and Receiver module (right). 



The right-hand side of Fig. 2 shows the receiver module, which has a counter 
(NextRec) specifying the next packet which can be accepted. NextRec is a counter 
on the form (gen^seq)^ where gen is an integer giving the current generation, 
and seq is the sequence number of the packet within the generation. If the ex- 
pected packet arrives, the counter is increased and an acknowledgement with the 
sequence number of the next packet is sent back to the sender. If a packet arrives 
out of order, the receiver will respond with an acknowledgement containing the 
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sequence number of the packet it was expecting to receive. In the CPN model a 
check has been inserted to inspect the validity of the packets being successfully 
received. This means that the Failure state will be marked if and only if the 
receiver ever accepts an invalid packet. 

The aim of the analysis is to check whether the cyclic sequence numbers are 
too small, i.e., whether the Failure state of the receiver will ever be marked. Using 
conventional state spaces generation, a straightforward way to investigate this 
property would be to generate the state space and check for each newly created 
state whether the place Failure is marked. If a state is detected where the place 
Failure is marked, generation can be terminated and an error reported. For the 
sliding window protocol example it is, however, possible to exploit the NextRec 
counter in the receiver to reduce the memory used for state space storage during 
the search. The basic observation is that the NextRec counter has the property 
that as the protocol executes, generation numbers are increasing and so are the 
sequence numbers within each generation. This in turn makes it possible to define 
a progress measure on the states by means of which the progress of the sliding 
window protocol can be quantified, and which makes it possible to compare states 
wrt. their progress measure. Let for a state 5, {geus, seqs) denote the value of the 
receiver’s NextRec counter in s. We can then talk about a state s representing a 
state where the protocol has progressed further than in another state s' (written 
s' < s) if and only if {gens' < V {{geug' = geug) A {seqg' < seqg)). Since 

generation numbers are increasing and so are the sequence numbers within each 
generation, the above progress measure has the property that for all successors 
s" of a state s, s < s" . This means that from a state 5, it is never possible to 
reach a state s" with a progress measure less than the progress measure of s. 

In conventional state space generation, the states are kept in memory to 
recognise already examined states. However, states which have a progress mea- 
sure which is strictly less than the minimal progress measure of those states for 
which successors have not yet been calculated can never be reached again. It is 
therefore safe to delete such states. Saving memory by deleting such states is the 
basic idea underlying the sweep-line method. The role of the progress measure 
is to be able to recognise such states. 

Figure 3 illustrates the intuition behind the sweep-line method, sq denotes 
the initial state of the system. The two gray areas show the states kept in mem- 
ory. Some of these have been processed (light gray), i.e., their successor states 
have been calculated, and some have only been calculated (dark gray). There 
is a sweep-line through the stored states separating the states with a progress 
measure which is strictly less than the minimal progress measure among the 
unprocessed states, from the states which have progressed further than the min- 
imal unprocessed states. States strictly to the left of the sweep-line can never be 
reached from the unprocessed states and can therefore safely be deleted. As the 
state space generation proceeds, the sweep-line will move from left to right. We 
drag the sweep-line through the state space, calculating the reachable states in 
front of the sweep-line and deleting states behind the sweep-line. All states on 
the sweep-line have the same progress measure. 
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Fig. 3. The sweep-line method. 



The full state space of the sliding window protocol with 5 packets has 28,438 
nodes. If the sweep-line method is applied, using our prototype implementa- 
tion, then at most 8,622 nodes are stored at any moment during state space 
generation. This means that the memory consumption measured in the number 
of nodes is reduced by almost 70%. Moreover, the time used for keeping track 
of progress measures and deleting states on-the-fly is more than compensated 
for by the time gained in faster insertion of new states in the state space. We 
will provide further statistics for the application of the sweep-line method on 
the sliding window protocol in Sect. 6, where we also give other examples of 
applications of the sweep-line method. 



3 Progress Measures 

In this section we formalise the notion of progress measure which is the funda- 
mental concept underlying the sweep-line method. We assume that the systems 
we are considering can be characterised as a tuple M = (§,T, sq), where § 
is the set of states, T is the set of transitions, ACSxTxSis the transition 
relation, and sq is the initial state. Most models of concurrent systems including 
CPN models, fall into this category of systems. 

Let s, s' G § be two states and t G T a transition. If {s,t, s') G ZA we say 
that t is enabled in s, and that the oeeurrenee of t in the state s leads to the 
state s' . This is also written s s' . A state Sn is reaehable from a state si 
iff there exists states ^ 2 , 53 ,... , 5^-1 and transitions such that 

{si,ti, Si-^i) GZ\forl <i<n— 1. Ifa state s' is reachable from a state s we 
write s s'. For a state s, reach{s) = { 5 ' G E>,\s s' } denotes the set of 
states reachable from s. The set of reaehable states of A4 is then reach(so). The 
state spaee of a system is the directed graph (V,E) where V = reach(so) and 
E = {{s,t,s')eA\s,s'eV}. 

A progress measure specifies a partial order (O, C) on the states of the system. 
A partial order (O, C) consists of a set O and a relation □ C O x O which is 
reflexive, transitive, and antisymmetric. Moreover, the partial order is required 
to preserve the reachability relation of the system: 




456 



S0ren Christensen, Lars Michael Kristensen, and Thomas Mailund 



Definition 1. A progress measure is a tuple V = (O, E,'0) such that (O, C) 
is a partial order and : E> ^ O is a mapping from states into O satisfying: 

Vs, s' G reach{so) : s s' ^ fj{s) C □ 

It is worth noting that the definition of progress measure implicitly states 
that for all s G reach(so) : tp{so) C i*e., the initial state is minimal 

among the reachable states with respect to the progress measure. We only require 
s s' '0(s) C f’(s') for reachable states s, s' G reach(so). In general we 
cannot determine whether s G reach(so) for arbitrary states s G § without 
calculating the full state space. Hence in practice, a conservative approach is to 
ensure the property for all states in S. In general we cannot determine whether 
s s' either, without calculating the state space, but for some systems it is 
possible to determine statically from the model that for all transitions t and all 
states s, s' G § we have s s' ^ '^(s) C and hence by transitivity that 

s s' ^ '0(s) C 'f(s'). Since progress measures in the general case will be 
user-specified, they can be erroneous. The mapping '0 : S ^ O could violate 
s s' ^ fj{s) C 'ipis'). However, since s s' ^ '0(s) C '0(s') for all 
s, s' G reach(so) iff s ^ s' ^ fj{s) C for all s, s' G reach(so) and all t e 
this property can easily be checked during the state space exploration. When we 
process the enabled transitions in a state s, we check that all successor states 
have a progress measure greater than or equal to s. 

All models have progress measures. A trivial progress measure is O = {A} 
(the one-point set), C= {(A, A)} (the order on that set), and '0(s) = A for all s G 
S. However, this progress measure offers no reduction of the state space. Another 
progress measure is O = Sscc, the set of strongly connected components of the 
state space, A=^gQQ, the reachability relation between the strongly connected 
components, and i^{s) = SCC{s), i.e., ip maps a state into the strongly connected 
component to which it belongs. This progress measure offers maximal reduction 
for the sweep-line method, but since in general we cannot compute the strongly 
connected component to which a state belongs without calculating the state 
space, this progress measure is of little practical interest. What is needed is a 
non-trivial progress measure that can be computed based on the individual states 
alone, i.e., no knowledge of the state space is required. One example of this is the 
progress measure of the sliding window protocol defined in the previous section. 



4 State Space Exploration 

Exploring the state space with the sweep-line method is based on the algorithm 
used for conventional state space construction. Figure 4 shows the standard 
algorithm for constructing a state space. It works on three sets: Nodes, the set 
of nodes/states in the state space; Edges, the set of edges in the state space; 
and Unprocessed the set of states that has been reached so far, but which have 
not been further processed, i.e., their successor states have not been calculated. 

The sweep-line state space generation algorithm is derived from the standard 
algorithm by adding garbage collection. At certain intervals we delete all states 
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1: Unprocessed. Add(so) 

2: Nodes. Add(so) 

3: while ^ UNPROCESSED. Empty() do 
4: s ^ Unprocessed. GetNext() 

5: for all (t, s') such that s ^ s' do 

6: Edges. ADD(s,t, s') 

7: if -1 Nodes. CoNTAiNs(s') then 

8: Nodes. Add(s') 

9: Unprocessed. Add(s') 

10: end if 

11: end for 

12: end while 

Fig. 4. Generic algorithm for state space exploration. 



that have a progress measure which is strictly less than all states in Unpro- 
cessed, and at the same time we delete all edges connecting deleted states. If 
the order is total, as was the case in Sect. 2, there will be one minimal progress 
measure among the unprocessed states, and it suffices to compare the progress 
measure of a state to that minimal progress measure. Since in general C is only 
required to be a partial order, it is possible for all unprocessed states to have 
different, incomparable progress measures. Hence, in worst case the progress 
measure of a state needs to be compared to each of them in order to determine 
whether it can be deleted or not. 

When to garbage collect can be decided in different ways. Garbage collecting 
in each loop is likely to be too time consuming and the number of states that 
can be deleted in each iteration is likely to be small. Collecting at fixed intervals 
can suffer from the same problems, but it is less likely. If the intervals are chosen 
to be too large, however, there may not be sufficient memory to store all states 
calculated between two collections. This tradeoff can be adjusted dynamically, 
by determining when to collect based on available memory and/or the amount of 
memory reclaimed in previous collections. When memory is scarce, the interval 
should be decreased, when little memory is reclaimed per collection, the interval 
should be increased. In our prototype implementation (see Sect. 6), garbage 
collection is done whenever a fixed, user specified, number of nodes have been 
added to the state space. The results obtained with this simple strategy were 
quite satisfactory for our experiments, so we have not yet experimented with 
other strategies. 

To maximise the number of states that can be deleted in each garbage col- 
lection, the minimal unprocessed states should have progressed as much as pos- 
sible. Therefore, the method GetNext on Unprocessed (which return the 
next state to process) should preferably always return a state with a minimal 
progress measure. If the progress measure maps to a total order, as has been the 
case for all our applications, then Unprocessed can be implemented as a pri- 
ority queue using the progress measure as the priority, and C as the ordering. In 
this case a breadth-first generation based on progress measure is obtained. With 
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other progress measures, other data structures might be ueeded to implemeut 
Unprocessed efhcieutly. 

It is worth observiug that the progress measure eusures that all states of a 
strougly couuected compoueut will be garbage collected at the same time. The 
reasou for this is that uodes iu the same strougly couuected compoueut have the 
same progress measure as a cousequeuce of Def. 1. This meaus that au efficieut 
way to capture strougly couuected compoueuts is to do this as au iutegrated part 
of the garbage collectiou algorithm. This is iuterestiug siuce strougly couuected 
compoueuts are used to check certaiu properties of systems iu au efficieut way. 

5 Checking Properties 

The sweep-liue method garbage collects uodes shortly after haviug created them. 
Heuce, to obtaiu verificatiou results, properties must be checked ou-the-fly. Iu 
this sectiou we show how the sweep-liue method cau be used to verify a uumber 
of staudard behavioural properties of systems. The properties cousidered do uot 
represeut au exhaustive list of properties which cau be verified with the sweep- 
liue method. Here, we have choseu a set of behavioural properties which iu our 
experieuce coustitute properties which are ofteu of iuterest for the aualysis of 
systems. Combiuiug the sweep-liue method with more geueral temporal logic 
model checkiug is beyoud the scope of this paper. 

A deadlock is a state iu which uo trausitious are euabled. Usiug the algorithm 
iu Fig. 4, the state s could be examiued betweeu Hue 4 aud Hue 5, aud if tHere 
are uo euabled trausitious iu s it should be reported as a deadlock. 

Checkiug if a state satisfyiug a giveu predicate is reachable is straightforward. 
The sweep-liue method guarautees that each reachable state is visited at some 
poiut. If oue of the visited states satishes the predicate we auswer “yes”, if uoue 
satisfy the predicate we auswer “uo” . Checkiug a safety property meaus checkiug 
that ah reachable states satisfy a giveu predicate. This amouuts to checkiug that 
the uegatiou of the predicate is uot reachable. 

Checkiug that a giveu predicate is a home predicate meaus checkiug that from 
ah reachable states, it is possible to reach a state where the predicate holds. A 
typical predicate could check if a giveu trausitiou is euabled. Home predicates cau 
be decided usiug the strougly couuected compoueuts, i.e., a predicate is a home 
predicate iff each termiual strougly couuected compoueut coutaius at least oue 
state satisfyiug the predicate. Siuce ah states iu a strougly couuected compoueut 
have the same progress measure, uo part of a strougly couuected compoueut cau 
be garbage collected before the eutire strougly couuected compoueut has beeu 
computed. Checkiug this kiud of property requires ouly that termiual strougly 
couuected compoueuts are aualysed before they are deleted. Wheu we garbage 
collect, we calculate the strougly couuected compoueuts of the states we are 
about to delete, isolate the termiual strougly couuected compoueuts aud check 
whether they coutaiu a state satisfyiug the predicate. 

Wheu a deadlock or a violatiou of some property is detected, a trace leadiug 
to the deadlock or a state violatiug the property should be reported. With the 
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sweep-line method, reporting such a trace is complicated by the fact that states 
between the initial state and the state at the end of the trace might have been 
garbage collected. These states need to be re-computed. In some cases it is 
possible to calculate transitions “backwards”, i.e., from a state s it is possible to 
calculate all states s' and transitions t such that s' s. In such cases, a trace can 
be found by searching backwards for the initial state. Since all reachable states 
have a progress measure greater than or equal to the initial state, searching can 
be stopped along a given path if the progress measure drops below this value. 

In many cases, however, it is not possible to search backwards. In such cases, 
the following scheme can be used. When garbage collecting, we make sure that 
for any unprocessed state 5 we have at least one predecessor s' with a progress 
measure strictly less than that of s among the states we do not garbage collect. 
In this way, when we need to calculate a trace leading to state s, the intersection 
between Nodes and the set of predecessors of s is non-empty. In this intersection, 
there is a set of minimal states. We can start a search for one of these, using the 
sweep-line method. When one is found, we construct the last part of the trace 
as the path from this state to s. The state we find will have a set of predecessors 
stored in Nodes, and we can start a search for one of these. This can then be 
iterated until we have built a path from to s. In each iteration, the distance 
to 5o is shortened by at least one, so the algorithm is guaranteed to terminate. 

6 Experimental Results 

A prototype of the sweep-line method has been implemented based on the state 
space tool of Design/ CPN [4]. In this section we give a first evaluation of 
the practicality of the sweep-line method by applying this prototype on three 
examples. The first example is the sliding window protocol from Sect. 2. The 
second example is a stop-and-wait communication protocol taken from [16]. The 
third example is taken from the industrial case-study [3] in which state spaces 
of timed CP-nets and the Design/ CPN tool were used to validate vital parts 
of the B&O BeoLink system. All results in this section were obtained using a 
Pentium II 166 Mhz PC with 160 Mb of memory. 

The prototype implementation uses a simple algorithm for initiating a gar- 
bage collection during the sweep: Whenever n new states have been added to 
the state space, a garbage collection is initiated. The garbage collection is im- 
plemented as a copying collector: when collecting, the states that should not be 
deleted are copied into a new state space. This new state space then becomes 
the current state space, and the old state space is deleted. This scheme was 
chosen for its simplicity, but it has the drawback that it requires space for two 
copies of the states that are not deleted. This problem can be avoided using 
other garbage collection techniques, but for our experiments the copy collection 
proved sufficient. 

Sliding Window Protocol. Table 1 lists statistics for the application of the sweep- 
line method for different configurations of the sliding window protocol from 
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Table 1. Experimental results - Sliding Window Communication Protocol. 



Packets 


Full State Spaces 
States Time 


Sweep-Line Method 
States Time 


Reduction 
States Time 


5 


28,438 


0:02:39 


8,622 


0:01:26 


69.7 % 


45.9 % 


10 


60,013 


0:08:48 


8,622 


0:03:26 


85.6 % 


61.0 % 


15 


91,588 


0:21:15 


8,622 


0:05:26 


90.6 % 


74.4 % 


20 


123,163 


0:33:16 


8,622 


0:07:36 


93.0 % 


77.7 % 


25 


154,738 


0:51:10 


8,622 


0:09:22 


94.4 % 


77.9 % 



Sect. 2. The results were obtained by garbage collecting for each 2000 new states. 
The table consists of four main columns. The Packets column gives the configura- 
tion under consideration, i.e., the number of packets that the sender is requested 
to send to the receiver. The Full State Spaces column gives the number of states 
in the full state space and the CPU seconds it took to generate it. The Sweep- 
Line Method column gives the maximal number of states stored during the sweep, 
and the time it took to make the sweep through the state space. The Reduction 
column compares the sweep-line method to full state spaces by giving the re- 
duction obtained in terms of states which had to be stored, and the reduction 
in run-time. E.g., for 5 packets a reduction of 69.7 % is obtained in the number 
of states, and a reduction of 45.9 % in run-time. 

Table 1 shows that the use of the sweep-line method saves both memory 
and time. Moreover, the savings obtained grew with the system configuration. 
It was to be expected that the sweep-line method would reduce the memory 
consumption since states are being deleted during generation of the state space. 
However, from the results listed in Table 1 it follows that for the sliding- window 
protocol the sweep-line method is also faster than full state spaces. Hence, in this 
case the overhead added by having to delete states is accounted for by having 
significantly fewer states to compare with in the hash collision list when a new 
state has been generated and is to be inserted in the state space. 

Stop-and-wait Communication Protocol. The stop- and- wait protocol contains a 
single sender and a single receiver. Data packets are to be transmitted from 
sender to receiver such that each packet is received exactly once, and in the 
right order. Each packet contains a sequence number. This number is increased 
for each new packet and the sequence number is never reset. Thus the sequence 
number of the packet currently being transmitted by the sender can be used as 
a progress measure. 

Table 2 lists statistics for the application of the sweep-line method for dif- 
ferent configurations of the stop-and-wait protocol. The results were obtained 
when garbage collecting after each 2000 new states. The table consists of the 
same four main columns as the table for the sliding window protocol. 

BeoLink System. The Bang & Olufsen BeoLink system makes it possible to 
distribute audio and video throughout a home via a network. The state space 
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Table 2. Experimental results - Stop- and- Wait Communication Protocol. 



Packets 


Eull State Spaces 
States Time 


Sweep-Line Method 
States Time 


Reduction 
States Time 


10 


2,576 


0:00:04 


2,002 


0:00:04 


22.3% 


0.0% 


50 


13,416 


0:00:29 


2,278 


0:00:21 


83.0% 


27.6% 


100 


26,966 


0:01:20 


2,278 


0:00:43 


91.6% 


46,2% 


200 


54,066 


0:04:04 


2,281 


0:01:29 


95.8% 


63.5% 


300 


81,166 


0:08:36 


2,282 


0:02:17 


97.2% 


73.4% 


400 


108,266 


0:14:23 


2,282 


0:03:05 


97.9% 


78.6% 


500 


135,366 


0:21:57 


2,282 


0:03:58 


98.4% 


81.9% 


1000 


270,866 


1:21:10 


2,284 


0:08:51 


99.2% 


89.1% 



analysis in [3] focused on the lock management protocol of the BeoLink system. 
This protocol is used to grant devices exclusive access to various services in 
the system. The exclusive access is implemented based on the notion of a key. A 
device is required to possess the key in order to access services. When the system 
boots no key exists, and the lock management protocol is (among other things) 
responsible for ensuring that a key is generated when the system starts. It is the 
obligation of the so-called video or audio master device to ensure that new keys 
are generated when needed. Timed CP-nets were applied in [3] since timing is 
crucial for the correctness of the lock management protocol. Here we used the 
sweep-line method to verify that when the BeoLink system starts eventually a 
key is generated by the lock management protocol. 

The progress measure used for the CPN model of the BeoLink system is 
based on the time concept of CP-nets. The progress measure is based on the 
fact that timed CP-nets have a global clock giving the current model time, and 
that time cannot go backwards in a timed CP-net. That time always progresses 
is a general property of the time concept of CP-nets, and as a consequence, time 
can be used as a progress measure on timed CPN models of systems. This is an 
example of a progress measure being given by the formalism, rather than being 
specific to individual models. 

Table 3 lists statistics for the verification of the initialization phase of the 
BeoLink system for different configurations of the BeoLink system. The Config 
column specifies the configuration in question. Configurations with one video 
master are written on the form VM: n, where n is the total number of devices in 
the system. Configurations with one audio master are written on the form AM: 
n. The results were obtained when garbage collecting after each 3000 new states. 

7 Conclusion 

In this paper we have presented a sweep-line method for alleviating the state 
explosion problem. The method relies on the notion of progress measure com- 
bined with breadth-first state space generation. The experimental results ob- 
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Table 3. Experimental results - BeoLink System. 



Config 


Full State Spaces 
States Time 


Sweep-Line Method 
States Time 


Reduction 
States Time 


VM: 3 
AM: 3 


1,130 

1,839 


0:00:06 

0:00:11 


1,130 

1,839 


0:00:06 

0:00:11 


0.0 % 
0.0 % 


0.0 % 
0.0 % 


VM: 4 
AM: 4 


13,421 

22,675 


0:02:40 

0:05:32 


5,170 

5,170 


0:02:50 

0:04:39 


61.5 % 
77.2 % 


-6.0 % 
16.0 % 


VM: 5 
AM: 5 


164,170 

282,399 


2:30:27 

5:03:53 


35,048 

35,048 


1:08:28 

1:59:39 


78.7 % 
87.6 % 


54.5 % 

60.6 % 



tained with a prototype implementation of the method have been encouraging, 
and demonstrated significant savings in memory as well as in time. 

The aim of this paper has been to develop and specify a basic version of 
the sweep-line method, and we have shown how the basic sweep-line generation 
algorithm can be adapted to allow on-the-fly verification. Investigating how other 
model checking algorithms can be combined with the sweep-line method is a topic 
of future work. The constraint is that the sweep-line method relies inherently 
on breadth-first generation of the state space. The model checking procedure 
therefore needs to be breadth-first based in order to be compatible with the 
sweep-line method. Future work also includes investigating the combination of 
the sweep-line method and other state space reduction methods such as partial 
order reduction methods [21,19,22] and the symmetry method [6,14,5]. 

The sweep-line method is geared towards systems for which it is possible to 
quantify its progress based on the states. The method does not work well for 
fully or almost fully reactive systems, where most of the state space is strongly 
connected, i.e., the state space has very few strongly connected components. In 
fact, the number of nodes in the largest strongly connected component gives a 
lower bound on the memory consumption of the sweep-line method. The ex- 
amples contained in this paper demonstrate, however, that there exists many 
interesting and non-trivial systems for which a progress measure can be speci- 
fied, and where the sweep-line method is a very efficient way of analysing these 
systems. A measure of progress seems likely to be present also in other systems. 

A disadvantage of the sweep-line method is that when a violation of a prop- 
erty has been detected, a run-time expensive backwards search is required in or- 
der to provide an execution/counter example showing why the property does not 
hold. With other similar methods such as state space caching [9,12] and bit-state 
hashing [10,11], the counter example is immediately available on the depth- first 
search stack. With state space caching, state space generation is expensive and 
counter example generation is inexpensive, whereas with the sweep-line method 
the opposite is true. On the other hand, since the sweep-line method relies on 
breadth-first generation it is more geared towards finding short counter examples 
which is not the case for the state space caching method. 




A Sweep-Line Method for State Space Exploration 463 



Two other methods for deleting states during state space generation have 
been given in [18] and [15]. The observation behind [18] is that a state can be 
deleted once all its predecessor states have been explored. The method in [18] 
relies on being able to compute the number of predecessor states. This is for 
instance possible when the transition relation of the system can be represented 
using BDDs [2]. The method in [15] combines partial-order reduction and bit- 
state hashing with the idea of deleting states which are invisible to the LTL-X 
temporal logic property to be verified. Invisible states are deleted on-the-fly 
during state space generation, and to alleviate the problem of revisiting states, a 
preprocessing phase based on bit-state hashing is used to compute approximative 
information about the number of predecessors of states. This makes it possible 
to avoid blindly deleting states which could possibly be visited again. The sweep- 
line method and the methods in [18] and [15] all exploit different information to 
obtain the criteria for deleting states. The sweep-line method exploits a progress 
measure which is typically a property of the system to be verified, [18] exploits 
semantic information about the predecessors of each state, whereas [15] exploits 
the temporal logic property to be verified. 

We have seen that for timed CP-nets it is possible to define a progress measure 
based on the modelling formalism and which, as a consequence, is applicable 
to all timed CPN models of systems. In other cases the user is required to 
provide the progress measure as input to the sweep-line generation algorithm. It 
is therefore relevant to ask how difficult it is to come up with a progress measure. 
We claim that if there is progress present in the system, then the modeller has 
in most cases an intuition about this which can be formalised into a progress 
measure and provided to the tool. In the prototype implementation of the sweep- 
line method in the Design/CPN state space tool, the full Standard ML [20] 
programming language is available to the user for specifying progress measures, 
and hence offers great flexibility for specifying progress measures. The provided 
progress measure is also required to fulfill the property that all successors of a 
state s have a progress measure which is greater than or equal to the progress 
measure of s. If the user specifies a progress measure as input to the tool which 
does not have the required property, i.e., the progress is not actually present in 
the system, then this is not disastrous. Violations against the required property 
can be detected fully automatically by the tool during state space generation. 
In case a violation is detected, the state space generation can be stopped and 
a state and one of its successor states reported back to the user demonstrating 
why the provided progress measure does not fulfill the required property. 
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Abstract. The explosion in the number of states due to several inter- 
acting components limits the application of model checking in practice. 
Compositional reasoning ameliorates this problem by reducing reasoning 
about the entire system to reasoning about individual components. Such 
reasoning is often carried out in the assume- guarantee paradigm: each 
component guarantees certain properties based on assumptions about 
the other components. Naive applications of this reasoning can be circu- 
lar and, therefore, unsound. We present a new rule for assume- guarantee 
reasoning, which is sound and complete. We show how to apply it, in a 
fully automated manner, to properties specihed as synchronous timing 
diagrams. We show that timing diagram properties have a natural de- 
composition into assume- guarantee pairs, and liveness restrictions that 
result in simple subgoals which can be checked efficiently. We have im- 
plemented our method in a timing diagram analysis tool, which carries 
out the compositional proof in a fully automated manner. Initial applica- 
tions of this method have yielded promising results, showing substantial 
reductions in the space requirements for model checking. 



1 Introduction 

Compositional reasoning [ 7 ] - reducing reasoning about a system to reasoning 
about its components - has been an active area of research for nearly three 
decades. Recently, it has gained further importance as a way of ameliorating 
the state explosion problem in model checking. Eor example, given programs Ui, 
P2 and specification T, we would like to check whether the composed system 
satisfies T (written as P\j IP2 |= T)- Since reasoning about Pij j P2 directly only 
exacerbates the state explosion problem, compositional reasoning techniques are 
designed to reason about P\ in isolation from P2 (and vice versa) to draw con- 
clusions about Pif fP2- There are, however, several difficulties which must be 
overcome, foremost among them are the task decomposition problem, the gen- 
eration of auxiliary assertions and the general applicability of the compositional 
method to the task at hand. 

^ Partially supported by NSE 980-4736, TARP 003658-0650-1999 and SRC 98-DP-388. 
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Firstly, task decomposition is necessary since it is nnlikely that Pi by itself 
satisfies all of P: we wonld like to decompose T into Pi and P2 snch that P = 
Pi A P2 and then show that Pi |= Pi and P2 |= P2 • Secondly, anxiliary assertions 
are nsnally necessary, since Pi may satisfy Pi only when its environment behaves 
like P2. To solve this problem, assume- guarantee style reasoning adds anxiliary 
assertions, Q2 (respectively Qi) which represent assnmptions abont the behavior 
of P2 (Pi) as an environment for Pi (P2). Snch anxiliary assertions mnst often 
be generated by hand, however. Finally, naive compositional rnles based on this 
style of reasoning, for instance, P1//P2 |= P holds if P1//Q2 |= and P2//Q1 |= 
P2, are sonnd only for safety properties. 

In this paper, we first present a new rnle for assnme-gnarantee reasoning, 
which generalizes several earlier rnles (cf. [ 15 , 1 , 3 , 12 , 13 ]), by removing the sonr- 
ces of incompleteness in some of these rnles, by nsing processes, instead of tem- 
poral logic formnlas, as specifications, and by allowing more general forms of 
process definition and composition. The new rnle extends the naive rule above 
with a check for soundness. As it deals uniformly with processes, it fits in well 
with a top-down refinement approach to designing systems. We show that this 
rule is also complete, in that if P1//P2 |= P, then it is possible to prove this fact 
with our rule. 

Next, we explore the benefits of applying this rule in the case where P is 
specified as a timing diagram. Timing diagrams are visual descriptions of process 
behavior that are widely used in the hardware industry. We show that not only 
is task decomposition a relatively simple problem for timing diagrams, but also 
that it is possible to automatically generate auxiliary assertions directly from the 
specification. Furthermore, we identify a large class of timing diagrams for which 
the soundness check of the rule is always satisfied, and the auxiliary assertion 
generation and, therefore, the model checking process is efficient - linear in 
the size of the diagram and the structure. We have implemented our method 
in a timing diagram analysis tool, Rtdt [ 4 ], which uses the tool COSPAN [ 8 ] 
to discharge model checking subgoals. We report here on its application to a 
memory controller and a PCI Interface Core; in both cases, we obtain substantial 
reduction in the space used for model checking. 

The organization of the paper is as follows: we describe our new rule and prove 
its soundness and completeness in Section 2 . The theory behind the application 
of this rule to timing diagrams is presented in Section 3 . Our experiments with 
applying this rule are described in Section 4 . We conclude the paper with a 
description of related work in Section 5 . 

2 Assume-Guarantee Based Compositional Reasoning 

In this section, we first present the naive compositional reasoning rule and ex- 
plain why it is unsound. We then present our new rule, and show that it is both 
sound and complete. We begin by defining some basic concepts: processes, com- 
position, and closure. Although the eventual application of our rule is to finite 
state processes, we develop it in a more general setting. 
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2.1 Preliminaries 

For a non-empty set of typed variables G, an assignment of valnes to variables 
in G is called a V-state. A V- sequence x = xq, xi, . . . is non-empty seqnence 
(finite or infinite) of G-states. The length of x (nnmber of states in x) is written 
as \x\. We write x[i..j], for j > i, to denote the snbseqnence X{, , Xj and x] y 

to denote concatenation of a finite seqnence x to y. K language L over G is a 
set of finite or infinite seqnences of G-states. A IT-seqnence x, where G C IT, 
satisfies L if[ X projected on to T belongs to L. The term { 3 W : L) defines a 
langnage over V\W. A (T\lT)-seqnence x satisfies { 3 W : L) iff there exists a 
seqnence y, with the same length as x, snch that y is in T and x and y differ 
only on the valnes of variables in W. For a langnage L over T, let [L] mean 
that every T-seqnence (finite or infinite) satisfies L. Thns, for L\ and L2 over 
G, [Li ^ L2] denotes Li C T2. 

A process P is specified by a tnple (T, /, R, F). V is a non-empty set of typed 
variables, partitioned into three sets: private variables W, interface variables 
V\ and external variables V^. The variables , which are in 1-1 correspondence 
with T, represent values for T in the next state. The set of modifiable variables, 
is WUW. I{V^) is an initial condition, R{V, (T"^)^) is a transition relation 
and F{V) is a fairness condition. A T-sequence x is an execution of P iff I{xq) 
and for all i such that i + 1 < |x|, R{xi, Xi^i) holds. The set of finite executions 
is denoted by finexec{P). The language of P, T(T’), is the set of finite executions 
of P together with those infinite executions of P that satisfy F . The observable 
language of P, denoted by jC^ (P) , is the projection of its language on T* U T® . In 
the rest of the paper, we assume that private variables of a process are distinct 
from the variables of all other processes, since this does not affect the observable 
language. 

For processes P and A, the relationship “P implements A”, denoted by P |= 
A, is defined only iiVfiA) C VfiP), and is defined as [C^ [P] ^ [A]], which 

can be written as [C{P) ^ { 3 V^{A) : C{A))\. This matches the usual definition 
when A is an automaton, since a sequence over V^{A) is a run of the automaton. 

For a language L on variables T, the closure of L, denoted by cl[L), is a 
language consisting of T-sequences x where, for every i < \x\, there exists a 
sequence y such that a?[ 0 ..i]; y ^ L. For any process P, there is a process CL{P) 
with the property [C^ [CL[P)) = c/(P^(P))]. If P is finite-state, CL[P) is 
formed from P by changing the fairness condition of P to true. 

A process Q does not block process P iff (i) any initial state of P can be 
extended to an initial state of Pf fQ, and (ii) for any reachable state of Pf fQ, 
any transition of P from that state can be extended to a joint transition of 
P j IQ. A process is machine closed iff every finite execution can be extended to 
an infinite fair execution. 

The composition of the processes Pi = (Vi , /i , Pi, Pi) and P2 = (V2, P, P2? 
P2), denoted by P1//P2, is the process P = {V, I, R, F), where f/ = Vi U V2, 
w = Vf U Vf, W = Vi U Vi, / = /i A P, p = Pi A P2, and P = Pi A P2. 
The disjunction of the processes Pi and P2, denoted by Pi + P2, is defined as 
the process P = {V, I, R, F), where R = W U W U {c}, W = Vf U Vf U {c}. 
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R* = Viuvi,l= (c A h) V (-C A h),R= (c' = c) A ((c A Ri) V (-c A R 2 )), 
and N = (FG(c) A Fi) V (FG(-ic) A F2). The private variable c serves to choose 
initially between the two processes. The following proposition snmmarizes the 
properties of these constrnctions needed for the later proofs. 

Proposition 0. For processes Pi,P 2 ,P, 

(a) [finexec(Pi / / P2) = finexec{Pi) A finexec{P2)], 

[jCiPi//P2) = jC{Pi) a £(^2)], and [£^(^1/7^2) = £^(Ci) A £^(^2)] 

(b) [( 3 {c} : £(£i + £2)) = £(Ci) V £(^2)] 

(c) [jC^{CL{P)) = d(£^(£))] 

This definition of processes and of composition is qnite general: it inclndes 
Moore and Mealy styles of definition as special cases, and processes in a compo- 
sition can modify shared variables. Interleaving composition can be defined by 
adding a shared “tnrn” variable. 

2.2 Compositional Reasoning Rules 

To show that P\j IP2 |= T\j IT2 holds, one may attempt to show that Pi |= Ti 
and P2 1 = ^2- This “non-circnlar” proof often does not work if the compo- 
nents are tightly conpled, since Pi may satisfy Pi only in the presence of P2. 
Hence, several so-called “circnlar” proof rnles have been proposed, of which this 
is an example: to show P1//P2 |= P1//P2, show that (i) P1//P2 |= Pi, and (ii) 
P2I iTi 1 = T2. This rnle can be shown to be sonnd for non-blocking safety proper- 
ties (i.e., for finite compntations). It is, however, unsound for liveness properties. 
To see this, consider the following instantiation. 

process PI: var x: boolean; initially x=true or x=false; transition x’=y 
process P 2 : var y: boolean; initially y=true or y=false; transition y’=x 
property Tl: eventually (x) , property T 2 : eventually (y) 

Although both hypotheses hold, it is not trne that P1//P2 |= P1//P2, as the 
computation where x and y are always false is a valid computation of Pif fP2^ 
In an attempt to fix this problem, several proposed rules (cf. [ 1 , 3 ]) replace hy- 
pothesis (ii) with, say, P2I j CL{fF\) |= P2. Using the safety closure of Pi prevents 
any possibility of circular reasoning amongst liveness properties. On the other 
hand, this makes it difficult to apply the rule when liveness properties are needed 
as assumptions. We adopt a different strategy to fixing the problem: we use an 
additional hypothesis that checks if the circular reasoning is sound. For simplic- 
ity, we present this rule for the composition of two processes; it can be easily 
extended to apply to any finite composition. 

Rule: To show that P1//P2 |= T, find Qi Q2 such that the following 
conditions are satisfied. 

CO U*(Qi) G U*(Pi), Qi does not block P2, and symmetrically for Q2. 

Cl P1//O2 N Qu and P2//Q1 h Q2 
C2 Q1//Q2 

C3 Either Pi// CL(P) ^ {T + Qi + Q2) , ov P2/ / CL{T) ^(P+Q^ + Q^) 
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Note: Notice that hypothesis C3 need not be checked when T is a safety prop- 
erty, as [C^ { CL{T)) ^ {T)] holds in this case. 

Theorem 0 (Soundness). The rnle is sonnd for arbitrary Ti, P2 and T. 
Proof. We have to show that Pij j P 2 |= T follows from the conditions C0-C3. 
This, by definition, is eqnivalent to showing that [C{Pi/ /P2) ^ {T)]. By 

the resnlts in [2], any langnage L can be can be written as a conjnnction of 
the safety property cl{L) and the liveness property {cl{L) ^ L). Based on this 
characterization, we break np the proof into the following two parts. 

Safety [C{Pi//P 2 ) ^ c/(T^ (T))], and 
Liveness [C{Pi//P2) A cl{C^ {T)) ^ {T)] 

In the following, let W be the private variables of Qij IQ2- 

Lemma 0. [finexec{Pi/ / P2) ^ (BIT : finexec{Qi/ / Q2))] 

Proof Sketch. This follows from conditions CO and Cl by indnction on the 
length of execntions. □ 

First, we show the safety part by proving the eqnivalent (as cl{C{P)) is the 
set of execntions of P) statement [finexec[Pi/ / P2) ^ cl[C^ [T))]. Let U be the 
private variables of T. 

finexec{Pi/ / P2) 

^ ( by Lemma 0 ) 

(BIT : finexec{Qij IQ2)) 

^ ( as d{C{P j) inclndes finexec[P) ) 

(BIT : d{C{Q,//Q 2 ))) 

^ ( by C2; monotonicity of d ) 

(BIT : d{C^{T))) 

^ ( IT contains private variables not occnrring in T ) 

cliC^iT)) 



Next, we show the liveness part. 

C{Pi) A C{P 2 ) A cl{C^{T)) 

^ ( by Proposition 0(c) ) 

C{Pi) A £(^ 2 ) A C^{CL{T)) 

^ ( by condition C3 ) 

C{Pi) A £(£ 2 ) A £^(T + 0i+02) 

^ ( by Proposition 0(b); W UU U {c} consists of private variables ) 

(BIT U T U {c} : T(Pi) A £(^ 2 ) A (T(T) V C{Qi) V £(^ 2 ))) 

^ ( distribnting A over V ; Proposition 0(a) and condition Cl ) 

{3WUUU {c} : £(T) V £^(Oi//Q 2 )) 

^ ( distribnting B over V ; condition C2 ) 

(BIT U T U {c} : C{T)) V (BIT U T U {c} : (T)) 

^ ( IT U {c} consists of private variables not in T ) 

C^{T) 

□ 



470 Nina Amla, E. Allen Emerson, Kedar Namjoslii, and Richard Trefler 

Theorem 1 (Completeness- 1 ). The rule is complete for non-blocking pro- 
cesses Pi,P 2 that have disjoint interface variables. 

Proof. Suppose that P\j IP2 |= T holds. Let Qi = Q2 — P2- As Qi 

is non-blocking and has disjoint interface variables from P 2 , it satisfies the 
condition CO; similarly for the symmetric case. Condition Cl is satisfied as 
P1IIP2 1 = ^1 and P\j IP2 1 = P2 holds trivially. Condition C2 is P\j IP2 |= T, 
which is true by assumption. Condition C3 holds as Pi |= (T -h Pi + P 2 ) by 
weakening. □ 

Theorem 2 (Completeness- 2 ). The rule is complete for arbitrary processes. 

Proof. Suppose that Pi, P 2 , P are processes such that P 1 //P 2 |= T. Each P{ can 
be made non-blocking by adding a transition for each blocking condition to a 
special state that has a self-loop. If Pi, P 2 have shared interface variables C, then 
rename the variables C to Wi and W 2 in the processes Pi and P 2 respectively, 
and modify T to P^, which also accepts computations that diverge from P by 
differing on the values of W\ and W 2 or by entering a blocking state. The result 
of the 1= check is unchanged with the new processes. From the previous theorem, 
therefore, there is a proof of P 1 //P 2 |= P. □ 

3 Compositional Reasoning with Timing Diagrams 

In the previous section, we gave a sound and complete rule for assume-guarantee 
based compositional reasoning. In this section we show how to apply that rule to 
specifications in the form of timing diagrams. By focusing on timing diagrams, 
which are a highly regular specification formalism, we obtain several benefits. 
Firstly, for a large class of timing diagrams the soundness check C3 in the rule 
follows directly as a consequence of the expressiveness of the formalism and so 
can be dispensed with. Secondly, we take advantage of the fact that many timing 
diagrams have efficient model checking procedures. Finally, we also show that 
the generation of helper assertions is not only automatic but efficient for a large 
class of timing diagrams. 

Timing diagrams are a graphical notation commonly used to specify timing 
behavior of hardware systems. Synchronous Regular Timing Diagrams (SRTDA) 
[4] are a class of timing diagrams that correspond to a subset of the cj-regular 
languages. SRTDA have a formal syntax and semantics and there are efficient, 
polynomial time algorithms for model checking SRTDA (see [4] for details). 
These facts make SRTDA an effective formal specification notation. 

An SRTD is specified by describing a number of waveforms with respect to a 
given clock. The clock waveform is a sequence of boolean values ({0, 1}), where 
the value toggles at consecutive points. A change in the clock value from 0 to 1 
is called a rising edge, while a change from 1 to 0 is called a falling edge. The 
waveforms are sequences of values over {0, 1, X, P}, where X indicates a donfi- 
care value, and D a donfi-care transition. A change in value of a waveform (e.g., 
0^1) must occur at rising or falling edges of the clock. The waveforms of an 
SRTD are partitioned into an initial precondition part that does not contain any 
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dond-care transitions and the following postcondition part. In turn, the post- 
condition may be partitioned using pause markers. For example, in the SRTD 
of Figure 1, there are three signals, A.p, B.q and A.r, the clock, a precondition 
marker, etc. 

A donh care value (X) is used to specify that the value at a point is unknown, 
unspecified or unimportant. A maximal sequence of donh-care transition values 
(D) on a waveform must be preceded by a definite boolean value 6, and followed 
by the value ~^b. The sequence of D values indicates that the transition from b 
to -i6 occurs exactly once in the specified interval. A pause specifies that there 
is a break in explicit timing at that point, i.e. the value of the signals, except 
the clock, remains unchanged for an arbitrary but finite period of time. At each 
pause point, there must be at least one signal whose waveform has a definite 
change of value relative to the following point. This signal indicates the end of 
the pause. One such signal is designated as the “owner” of the pause. 



precondition marker 




Fig. 1. Annotated Synchronous Regular Timing Diagram 



An SRTD dehnes an cj-regular language. In [4], it is shown that we can 
construct regular expressions for the precondition Tpre and the postcondition 
Tpost of an SRTD T. In the remainder of the paper, we use Tpre (Tpost) lo 
denote both the syntactical definition of precondition (postcondition) and its 
associated regular expression. 

An infinite computation a satisfies an SRTD T (written cr |= T) if and only if 
every finite segment of a that satisfies the precondition is immediately followed 
by a segment that satisfies the postcondition of the diagram. The precondition, 
however, may be satisfied in an overlapping manner, which leads to two distinct 
notions of satisfaction, overlapping and non-overlapping semantics. 

Definition 0 (Overlapping Semantics). An infinite computation a satisfies 
an SRTD T under the overlapping semantics (cr |=o T) iff every occurrence 
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of Tpre in (7 is followed by an occurrence of Tpost- Formally, this is true iff 
a ^ Tpre] ~^Tpost), where U is the set of valuations to the signals in T. 

To define non-overlapping semantics, it is convenient to assume that there 
is an auxiliary proposition p such that for all sequences cr, p is true at the ith 
point iff Tpre is satisfied by a prefix of the suffix sequence starting at point i. 

Definition 1 (Non-overlapping Semantics). An infinite computation a sat- 
isfies an SRTD T under the non-overlapping semantics {a T) iff every occur- 
rence of Tpre that does not overlap an occurrence of Tpre or Tpost is immediately 
followed by an occurrence of Tpost- This is true iff cr G {{^py ]Tpre]Tpost)^ + 
Tpre; Tpost)*; 



Proposition 1. For any SRTD T, (j |=o T implies a T. 



3.1 Translation Algorithms 

In order to use SRTD A as a specification language in a compositional model 
checking paradigm we need to augment the above definitions of SRTD A with 
some information about the modularity of the design being verified. This is 
achieved by defining an ownership function O : S ^ N that maps each signal 
to the implementation module that controls it, where S is the set of signals and 
TV is a set of module names. The ownership function O can be used to partition 
the SRTD T into fragments^ 71, ... ,7^. The fragment T{ consists of Tpre, and 
only those waveforms in Tpost that are owned by module i. An SRTD fragment 
may not be a well-formed SRTD since a fragment may contain a pause whose 
pause owner is in another fragment. In Figure 1, the ownership function O maps 
signals A.p and A.r to module A and B.q to module R, and we have one fragment 
consisting of waveforms p and r and another with waveform q. 

We present an algorithm that translates an SRTD into a non-deterministic 
cj-automaton (cj-NFA) for the complement of the SRTD property under the non- 
overlapping semantics - the construction for the overlapping case is similar and 
is described in [4]. Then, we give an algorithm that constructs a process that 
generates the non-overlapping language of the SRTD fragments. 

To construct an c<;-NFA Af for the complement of the timing diagram lan- 
guage of T, we proceed as follows. First, we construct a deterministic automaton 
Apre from Tpre that accepts at the first point on a string where the precondi- 
tion holds. We do so by creating a non-deterministic automaton that accepts 
the language U*]Tpre and determinizing it, so that it enters an accepting state 
at every point on an input string where Tpre holds. We then eliminate outgoing 
edges from accepting states of this automaton. The number of reachable states 
in the resulting DFA can be exponential in the length of the precondition if 
the precondition has donfi-care values. Otherwise, there are only linearly many 
reachable states, as the reachable part of the DFA is just the automaton for the 
string matching problem, which can be constructed efficiently (cf. [6]). 
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Next, for each signal we construct an cj-DFA that tracks the wave- 

form for signal i over the length of the postcondition. This automaton checks 
at each clock point that the waveform has the specified value. For a donh-care 
transition, the automaton maintains an extra bit that records whether the tran- 
sition has occurred. For a pause, the automaton goes into a “waiting” state, 
where it checks that the value of the signal remains unchanged, and which it 
leaves when the pause owner signal changes value. The automaton for signal i 
accepts a computation iff either the waveform pattern is incorrect at some point, 
or if signal i is the owner of the kth pause in T and the automaton stays in the 
waiting state for pause k forever. 

The automaton Af works in the following manner: from the initial state, it 
runs Aprs on the input until this accepts; then it guesses a failing postcondition 
signal i and runs accepting if this accepts. If terminates (so the 

postcondition holds for signal i), A^ returns to its initial state. 

Theorem 3. (Correctness) For any SRTD T and infinite sequence (j, a T 

iff (j ^ L{Af)- 

The size of an SRTD T is product of the number of signals and the number 
of clock points. 

Theorem 4. (Model Checking Complexity) For a process M and an SRTD 
T, under the non-overlapping semantics, the time complexity of model checking 
is linear in the size of M and Tpost, and exponential in the size of Tpre- 



Theorem 5. For a process M and an SRTD T such that Tpre does not con- 
tain donh-care values the time complexity of model checking under the non- 
overlapping semantics is linear in the size of M and T. 



Theorem 6. For a process M and an SRTD T, the time complexity of model 
checking under the overlapping semantics is linear in the size of M and T. 

These constructions can be modified easily to construct similar automata for 
SRTD fragments; the modification consists of choosing the failing postcondition 
signal only amongst the postcondition signals of the fragment. 



3.2 Automatic Construction of Helper Processes 

We now present an algorithm that constructs a helper processes Qj that gen- 
erates the non-overlapping language corresponding to the fragment Tj of the 
diagram. The process Qj works as follows. R sets each signal i in Tj nondeter- 
ministically until the precondition holds, then it generates values for the signals 
ofT, as specified in the postcondition. For a dond-care value, the output is chosen 
nondeterministically. For a dond-care transition, the point at which the transi- 
tion occurs is chosen nondeterministically as well. If the process is the owner of 
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a pause, it non-deterministically decides when to generate this event and main- 
tains the current value till that point. The process has a fairness constraint that 
forces this event to occur within a finite period. Otherwise, it maintains its value 
until the event that signals the end of the pause occurs, without any requirement 
for termination. 

Proposition 2. (Correctness) Eor any SRTD fragment Tj, the corresponding 
helper process Qj is non-blocking, and cr is a computation of {//j : Qj) iff 
O' \=n T. 

The key feature of this construction is that, for every pause only the 
process that includes the signal owning the pause has a fairness constraint en- 
forcing the occurrence of the pause breaking event. This ensures non-interference 
between the fairness conditions, which is the essence of the soundness check in 
our compositional rule. 

Theorem 7. (Non-interference) Eor SRTD T under the non-overlapping se- 
mantics, the corresponding processes Qi, . . . , Qn, where n > 1, and computation 
(7, (7 G d[CP[Qxl j . . .//On)) implies a G CP [Q\ + • • • + On). 

Proof Idea. If a is in d[C^ [Qi! j . . .//On)), if must satisfy the waveform 
pattern at each point. If it is not in {Qi -h . . . + On), this can only be because 
(7 never produces the pause breaking event of a pending pause. But such a pause 
is owned by a particular Qb hence, cr is a computation of the Qj’®, 3 7^ ^ 

Theorem 8. For SRTD T with corresponding processes Qi, • • • , On, fhe num- 
ber of states of Qi// • • • / iQn can be exponential in the size of T. 

For linear timing diagrams, those with no overlapping donh-care transitions, 
no donh-care values at any pause and no donh-care values in the precondition, 
we have the following theorem. 

Theorem 9. For linear SRTD T and the corresponding processes Qi, . . . , Qn, 
the number of states oi Q\j j . . . j jQn is bounded by 0{\T\). 

3.3 Compositional Model Checking of SRTD’s 

In this section, we will describe a proof methodology that uses SRTD A as 
the property T in the proof rule in Section 2. We would like to show that 
Pi! I P 2 |=n P, where T is an SRTD (respectively, P\j IP 2 |=o P\ By our con- 
struction in Section 3.2, we know that any SRTD T can be automatically de- 
composed into helper processes Q\ and Q 2 relative to an ownership function. In 
order to apply the compositional rule with these choices for the Q* we need 
only check condition Cl and C3, as conditions CO and C2 are true by construc- 
tion. In the non-overlapping case, condition C3 need not be checked, as it follows 
from Theorem 7. Thus, the only condition to be checked is Cl. The details of 
this check are described in the following section. 
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4 Applications 

We have incorporated the algorithms described in the previons sections into 
the Rtdt tool [4]. Rtdt has a nser-friendly editor that allows a designer to 
create and edit SRTDA and a translator that complies the SRTDA into lo- 
antomata. Rtdt forms an easy to nse interface to the verification tool COSPAN 
[8]. COSPAN is based on the antomata-theoretic, langnage containment ap- 
proach to model checking, where both the implementation and the specification 
are specified as c<;-antomata. 

COSPAN checks A |= R by considering only the infinite fair execntions. 
In order to check inclnsion for the finite execntions as well, we ntilize machine 
closnre. If A is machine closed, any finite execntion x of A can be extended to 
an infinite fair execntion; thns, if the COSPAN check is snccessfnl, x matches 
some finite compntation of B. The alternative is to nse COSPAN A facilities for 
checking finite compntations, bnt this reqnires the prodnct of A and B to be 
constructed twice - once for each check. The machine closure method turns out 
to be better, as in some of our examples, processes are trivially machine closed. 
We added the ability to check machine closure to COSPAN. 

In our current implementation, we use the non-overlapping semantics since it 
requires that we only check condition Cl. We would like to take advantage of the 
linear-time (Theorems 5,6) model checking algorithms to discharge the obligation 
P 1 IIQ 2 1= Qi (similarly for the other obligation) in Cl. We use Proposition 1 to 
replace the more expensive check Pif f P 2 |=n T by the computationally cheaper 
check P 1 IIP 2 ho 

We used Rtdt in conjunction with COSPAN to verify two systems. The first 
is a synchronous memory access controller and the second is Lucent A Synthesiz- 
able PCI Interface Core. 

4.1 Memory Access Controller 

The memory access controller system has an arbiter that provides arbitration 
between two user processes and a memory controller that controls three target 
processes. The user processes may non-deterministically request a transaction 
and the arbiter grants one user permission to initiate the transaction. That 
user process may then issue a memory instruction by asserting either the read 
or write line and setting the address bus. The target whose tag matches the 
address awakens, services the request, then asserts the ack line on completion. 

We verified that this system satisfied both read and write memory transac- 
tions formulated as SRTDA. Table 1 presents the verification statistics of both 
the compositional and non-compositional approaches. In Table 1, Arb and Mem 
refer to the arbiter and memory controller implementation processes and ArP 
and Mem^ are the automatically generated helper processes. mc(Arb/M em^) 
and mc(ArP //Mem) refer to the machine closure check performed by COSPAN. 
Ta (Pm) is Ihe SRTD fragment that corresponds to process Arb (Mem). Table 
1 indicates that the compositional checks are more efficient than model check- 
ing Arb f f Mem |= T directly. The cost of checking Arbj jMevn! |= Ta is more 
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Model Checking 
Task 


Number of 
Variables 


Number of 
Reachable States 


Bdd Size 


Space 

(MBytes) 


Time 

(seconds) 


SRTD for the read transaction 


Arb//Mem 1= T 


260 


2.5e+06 


50084 


22 


73 


mc(Arb//Mem’) 


114 


1.9e+06 


14772 


0 


2 


mc(Arb 7/Mem) 


86 


1.9e+04 


14793 


0 


3 


Arb 7/Mem bXm 


129 


l.le+05 


17993 


6 


23 


Arb//Mem’ b Ta 


201 


l.le+06 


34861 


14 


46 


SRTD for the write transaction 


Arb//Mem b T 


258 


2.6e+06 


54834 


22 


77 


mc(Arb//Mem’) 


112 


l.Oe+06 


14551 


0 


2 


mc(Arb 7/Mem) 


99 


3.8e+04 


15432 


0 


4 


Arb 7/Mem b Tm 


106 


l.le+05 


16854 


2 


11 


Arb//Mem’ b Ta 


220 


7.3e+05 


42844 


17 


67 



Table 1. Verification Statistics for Memory Access Controller Design 



than checking AvV j j Mem |= Tm and this is becanse most of the signals in the 
SRTDA for both the read and write transactions belonged to the arbiter. 



4.2 Lucent’s PCI Synthesizable Core 

The second example is the Lncent Technologies PCI Interface Core, which is a 
set of bnilding blocks that bridges an indnstry standard PCI Bns interface to a 
high performance F-Bns. The F-Bns snpports mnltiple masters and slaves and 
there are separate master and slave interfaces to the PCI Bns. The PCI Interface 
Core is designed to be fully compatible with the PCI Local Bus specification [14]. 

In previous work [4], we used Lucent A PCI Bus Functional Model [5], which 
is a sophisticated environment that was developed to test the PCI Interface 
Core for functionality and compliance with the PCI specification. The Func- 
tional Model consists of the PCI Core blocks and abstract models for both the 
PCI Bus and the F-Bus. This model has about 1500 bounded state variables 
and was too large for model checking directly. We, therefore, restricted our ver- 
ification efforts to a part of this design called pcim-core that deals with basic 
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PCI functionality. The pctm-core process consists of a master controller mcntrl^ a 
slave controller scntrl, a configuration process config and an address multiplexer 
admux. In addition there is an environment process pcmi-ENV that contains 
all the inputs to the pctm-core process. We added a number of constraints on 
pctm-ENV to reduce the size of the state space. These constraints were property 
specific and were different for each property we checked. 



Model Checking 
Task 


Number of 
Variables 


Number of 
Reachable States 


Bdd Size 


Space 

(MBytes) 


Time 

(seconds) 


SRTD Burst Property 1 


MC7/SC//Env b Ts 


293 


5.2e+05 


158490 


14 


302 


MC//SC7/Env g Tm 


79 


1.2e+07 


44066 


3 


40 


MC//SC//Env p X 


335 


4.4e+08 


273140 


20 


511 


SRTD Burst Property 2 


MC7/SC//Env g Ts 


291 


3.8e+05 


115488 


9 


124 


MC//SC7/Env g Tm 


74 


9.9e+06 


42436 


3 


40 


MC//SC//Env p T 


331 


1 .8e+08 


241792 


18 


430 


SRTD Non Burst Property 1 


MC7/SC//Env g Ts 


127 


2.5e+28 


587771 


93 


5281 


MC//SC7/Env g Tm 


58 


1.4e+09 


77411 


3 


74 


MC//SC//Env b T * 


- 


- 


6725219 


342 


138110 



* did not complete due to shortage of space 

Table 2. Verification Statistics for PCI Synthesizable Core Design 



We formulated a number of properties as SRTDA by looking at the timing 
diagrams found in the PCI specification [14] and the PCI Core UserA manual 
[5]. These SRTDA were defined over signals controlled by mcntri and scntrl. We 
used Rtdt to automatically construct the helper processes MC' and SC' and 
the property automata Tm and Eg. In Table 2, ENV refers to the composition 
of pctm-ENV, config and admux, while MC and SC refer to mcntri and scntrl 
respectively. Machine closure was trivially satisfied since the pctm-core process 
did not contain any fairness. 
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The basic bus transfer on the PCI is a burst, which is composed of an ad- 
dress phase followed by one or more data phases. In the non-burst mode, each 
address phase is followed by exactly one data phase. The data transfers in the 
PCI protocol are controlled by three signals PciFrame^ Pcilrdy and PciTrdy. The 
master of the bus drives the signal PciFrame to indicate the start and end of 
a transaction. Pcilrdy is asserted by the master to indicate that it is ready to 
transfer data. Similarly the slave uses PciTrdy to signal that it is ready for data 
transfer. Data is transferred between master and slave when both Pcilrdy and 
PciTrdy are asserted on a rising clock edge. The PciStop signal is used by the 
slave to indicate termination of the transaction and the PciDevsel signal is used 
to indicate the chosen device. The first property in Table 2 stated that “in an on- 
going transaction, once the PciStop signal is asserted, the PciTrdy and PciDevsel 
signals remain constant until the data phase completes [Pcilrdy is deasserted)” . 
The second property specified that “if PciFrame is deasserted when both Pcilrdy 
and PciTrdy are asserted then the data phase completes successfully ” . The final 
property specified the non-burst mode, “if PciFrame is asserted for exactly one 
clock cycle and Pcilrdy^ PciDevsel and PciTrdy are eventually asserted then in 
the next clock cycle the transaction ends” . Table 2 indicates that the compo- 
sitional checks are far more efficient than the corresponding non-compositional 
checks. The non-compositional check for the non-burst property ran out of mem- 
ory, the numbers shown in Table 2 are the BDD size, space and time just before 
memory exhaustion. The slave controller scntrlhas a lot of interaction with both 
config and admux processes and this resulted in these processes being pulled into 
the cone of influence. This is reflected in the significant disparity in the numbers 
for the two compositional checks. 



5 Related Work and Conclusions 



As mentioned in the introduction, compositional reasoning for concurrently ac- 
tive processes has been the subject of much work over the past three decades. 
Our first contribution in this paper is the development of a sound and complete 
rule for reasoning about arbitrary processes, including those with fairness con- 
straints. Earlier work (cf. [15,1,3,12,13]) either applies only to restricted kinds 
of processes or temporal logic formulas, or proposes incomplete rules. Our rule 
extends a simple reasoning rule that is known to be sound for safety properties 
with an additional soundness check for liveness properties. Thus, in a sense, the 
rule isolates the difficulties with reasoning about liveness in the soundness check. 

The possibility of using timing diagrams for compositional verification ap- 
pears to have been first recognized in a paper by Josko [10] on modular reason- 
ing. This paper, however, uses timing diagrams only for illustrative purposes. In 
later work (cf. [9]), a compositional verification methodology proposed in [11] is 
used to verify timing diagrams. This work uses timing diagrams as a convenient 
notation for expressing temporal properties - the assume-guarantee reasoning 
is left to the verifier. In contrast, our work shows how assume-guarantee pairs 
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can be generated mechanically from timing diagram specifications, resnlting in 
a completely antomated compositional verification method. 

In onr work, we show that timing diagram specifications in the form of 
SRTDA are natnrally decomposable into assnme-gnarantee properties abont the 
components of the system. We also show that, althongh timing diagrams can ex- 
press liveness properties, the naive compositional reasoning rnle can be applied 
safely, as the additional sonndness check always sncceeds for the non-overlapping 
semantics. We show how to apply the compositional rnle in a fully automated 
manner. Our experiments with the memory controller and the PCI interface core 
show that compositional reasoning can indeed be done successfully in this way, 
producing substantial savings in the time and space required for the verification. 
Although, in these examples, the natural decomposition of the timing diagram 
property suffices for generating the helper process, it is possible that this will not 
true in some cases. Thus, heuristics for automatically generating helper processes 
may be needed - which we leave for future work. 
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Abstract. This paper develops an efficient algorithm for determining 
when one system is capable of simulating the behavior of another. The 
method combines an iterative algorithm for computing behavioral pre- 
orders with an algorithm that simultaneously computes the bisimulation 
equivalence classes of the systems in question. Experimental data indi- 
cate that the new routine dramatically outperforms the best-known al- 
goritm for computing simulation, even when the systems are minimized 
with respect to bisimulation before the simulation algorithm is invoked. 



1 Introduction 

A traditional problem in the verification of conenrrent systems is the follo^v- 
ing: given tv^o processes A and R, does B simulate A [Mil71]? The resnlt- 
ing simulation ordering has nnmerons practical motivations, both in its ov^n 
right as a refinement / approximation ordering [BBLS92,DGG97,Jon91,LV95] 
and as a vehicle on v^hich to base the definitions of other refinement order- 
ings [BHR84,DNH83]. Indeed, efficient algorithms for compnting the simnlation 
ordering nnderpin algorithms for compnting relations snch as trace inclnsion and 
the failnres/mnst preorder [CH93]. 

Despite its ntility, not much attention has been paid to algorithms for com- 
puting the simulation ordering for finite-state systems. Bloom and Paige [BP95] 
present a global routine that runs in time 0 (miU 2 + where nii and 

represent the number of states and transitions in the two systems being checked. 
Essentially the same algorithm was discovered independently in [HHK95], and 
similar ideas may be found in [CC95,CS90]. Celikkan [Cel95] defines an on-the-fly 
algorithm of comparable complexity. 

This paper develops a new technique for computing the simulation order- 
ing that combines the fixpoint calculation techniques of [BP95] with the fast 
bisimulation-mimmization algorithm due to Paige and Tarjan [PT87,Fer90]. One 
well-known way to improve the performance of a simulation checker is first to 
minimize the systems in question with respect to bisimulation in order to reduce 
the number of states that must be considered. By intertwining the computation 
of the bisimulation equivalence classes with the simulation relation, our approach 
exploits the benefits of minimization while avoiding the complete computation 
of equivalence classes if this is unnecessary. 



T. Margaria and W. Yi (Eds.): TACAS 2001, LNCS 2031, pp. 480-495, 2001. 
© Springer- Verlag Berlin Heidelberg 2001 
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2 Background 

In this paper systems will be modeled as labeled transition systems (LTSs). 

Definition 1. A labeled transition system is a triple {S,A, — where S is a 
set of states^ A a set of actions^ and — > C S x A x S the transition relation. 

States may be seen as “confignrations” the system may enter, while actions 
represent system activities that can canse state changes. We write s — > s^ in 
lien of {s, a, s') G — and we sometimes abnse terminology by referring to a tnple 
(S', A, — sj), where s/ G S is the start state, as a labeled transition system. 

As we make extensive nse of binary relations, we introdnce some terminol- 
ogy here. If C S x S is a binary relation over set S then we write R~^ = 
{(5^,s) I {s,s') G R} for the inverse of R. Also, if 5 G S then we use R{s) to 
represent the set {s' \ {s, s') G R}, and if T C S we define R{T) = R{s). 

Definition 2. Let (S, A, — >) be a LTS, and let R C S x S be a relation. Then: 

1 . R IS a simulation if for every {si,S2) G R and a G A, whenever si — ^ s'-^ 
then there is a s'2 such that S2 — ^ s'2 and {s'^, s'2) G R. 

2 . R IS a bisimulation if both R and R~^ are simulations. 

It is easy to establish that for any LTS there is a maximal simulation, A, and 
bisimulation, and that the former is a preorder while the latter is an equiva- 
lence relation. The following states an obvious connection between A and 

Theorem 1. Let (S', A, — >) be a LTS, with si,S2,ss G S. Then: 

1 . If Si ^ S2 and S2 A ss then si A 83. 

2 . If Si A S2 and S2 ^ S3 then si A 53. 

This result has practical implications for computing A, since it indicates that 
LTSs may be minimized with respect to ^ before calculating A. 

The notion of simulation may be extended to two LTSs as well. Let Ti = 
(S, Ai, — ^1) and T2 = (S2, A2, — ^2) be LTSs. Then a simulation from Ti to T2 
is relation R C Si x S2 satisfying the following for every (si, S2) G R and a G A: 

if Si —^1 s'l then there is a s'2 such that S2 —^2 s'2 and (s^^, ^2) ^ R- 

A maximal simulation A from Ti to T2 exists, and if si G Si and S2 G S2 then 
we write si S S2 when these states are in this relation. If Si H S2 = 0 then it is 
easy to show that si S S2 in the sense just described if and only if si and S2 are 
related by the maximal simulation in the single transition systems T = (Si U 
S2, Ai U A2, — ^1 U — ^2)- If Ti = (Si, Ai, — ^1, si) and T2 = (S2, A2, — ^2, S2) 
have start states si and S2 indicated, then we write Ti S T2 if si A S2. 
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3 The Relational Coarsest KA-Partition Problem 

We are interested in determining algorithmically whether T\ ^ T 2 , where Ti = 
{Si, A, — >i,si) and T 2 = {S 2 ,A, — > 2 , 82 ) are LTSs. To simplify the presenta- 
tion, we consider a restricted version of the problem in which transition systems 
are unlabeled. In what follows a(n nnlabeled) transition system is a pair (S', E) 
where S is a set of states and E C S x S is the transition relation. On occa- 
sion, we designate a start state 5/ G S and call (S, E, sj) a transition system. 
Transition systems may be seen as labeled transition systems whose action set 
A contains a single action. Note that in (S', S') S' is a binary relation over S. 

In this section we show how calcnlating S can be redneed to solving the Re- 
lational Coarsest KA-Partition Problem. To define this problem, we first review 
the Relational Coarsest Partition Problem [PT87], whose solntion corresponds 
to compnting the eqnivalence classes of ^ over a single transition system. 

The Relational Coarsest Partition Problem (RCPP). The statement of the RCPP 
uses the following terminology. 

Definition 3. Let (S, E) be a transition system. 

1 . A partition of S is a collection {Bi, . . . , Bn} of disjoint nonempty subsets of 
S such that S = Ur=i Each Bi in a partition P is called a block of P. 

2. A partition P refines a partition P^ (P < P^ ) if for every Bi G P there is a 
Bj G P^ such that Bi C B}. In this case we say that P^ is coarser than P. 

3. Let Si, S( C S. Then Si is stable with respect to S[ if either SinE~^ {S[) = 0 
or Si C E~^{S[). A partition P of S is stable with respect to a set S^ C S 
if each Bi G P is stable with respect to . A partitition P is stable with 
respect to another partition P^ if every Bi G P is stable with respect to every 
Bj G Ph A partition P is self-stable if it is stable with respect to itself. 

Intuitively, a set Si is stable with respect to S[ if either no state in Si has 
a transition into Si (Si H E~^{S[) = 0) or every state in Si has at least one 
transition into Si (Si C E~^[S[)). It is easy to see that < defines a partial order 
over the set of partitions of S and that a coarsest self-stable partition of S is 
guaranteed to exist. The RCPP may now be defined as follows. 

Given: Transition system (S,E) with |S| < 00 . 

Compute: The coarsest self-stable partition P of S. 

One may show that any self-stable partition of S is a bisimulation and that the 
blocks in the largest self-stable partition of S are the equivalence classes of 

The Relational Coarsest KA-Partition Problem. Theorem 1 suggests one way to 
improve the efficiency of computing whether or not Ti N T 2 : minimize both Ti 
and T 2 with respect to ^ before calculating N using e.g. the algorithm in [BP95]. 
Doing so entails using a preprocessing step to compute the equivalence classes 
of ^ for each of Ti and T 2 . Our goal is to find an way to compute bisimulation 
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classes and a simulation relation simultaneously, thereby eliminating the need 
for fully computing equivalence classes when this is unnecessary. Our method 
involves associating auxiliary information in the form of a set of “potentially 
simulating states” with each equivalence class of states in the “lower” transition 
system. This auxiliary information will also be recorded in terms of equivalence 
classes of states in the “upper” transition system. When a lower equivalence 
class is split, auxiliary information must be altered appropriately. To make these 
notions precise, we define the Relational Coarsest KA-Partition Problem, which 
is an alteration of the RCPP introduced above. 

Definition 4. Let Ti = (S^i, Ei) and T2 = (S' 2 , E2) be transition systems, 

1 , A kernel-auxiliary pair (KA-pair) has form where B C Si and X C 

52- We write (B,X) C {B' , X') if B C B' and X C Xb We often refer to 
B as the kernel set of (B,X) and X as the auxiliary set. 

2 , A set P of KA-pairs is a kernel-auxiliary partition (K A- partition) from Ti 
to T2 if Pi = {B I (B,X) G P} IS a partition of Si, 

3 , A KA-partition P refines KA-partition P^ (P < P^ ) if for every (B,X) E P 
there is a (B^,X^) E P^ such that (B,X) C (B)X^), 

4, KA-pair (B,X) is stable with respect to KA-pair {B)X^) if either B H 
Ef\B') = 0, or R C Ef\B') and X C E2~\X'), A KA-partition P is 
stable with respect to KA-pair (B^X^) if every (B,X) E P is stable with 
respect to {B)X^), KA-partition P is stable with respect to KA-partition 
P^ if P IS stable with respect to every KA-pair in Ph KA-partition P is 
self-stable if it is stable with respect to itself. 

Note that if T is a self-stable KA-partition from Ti to T2 and (B,X), (B) X^) E 
P, then either no state in B has a transition into Bf or every state in B has a 
transition into B^ and every state in X has a transition into Xh When T is a 
self-stable KA-partition, the set {B \ (B,X) E P} is a self-stable partition. 

Every KA-partition P defines a relation R{P) C Si x S2 given by: (si, S2) E 
R{P) if and only if there exists {B,X) E P such that si E B and S2 E X. The 
following is an easy consequence of Definition 4 and Theorem 1. 

Theorem 2. Let Ti = {Si,Ei) and T2 = (52,E'2) be transition systems with 
Si E Si and S2 E ^2- Then there is a unique coarsest self-stable KA-partition 
-Pmax from Ti to T2, and si < S2 if and only if{si,S2) £ -R(-Pmax)- 

The Relational Coarsest KA-Partition Problem may now be stated as follows. 

Given: Transition systems Ti = (Si,Ei), T2 = (S2,E2), with |5i|, |52| < 00. 
Compute: The coarsest self-stable partition P from Ti to T2. 

The statement of this problem does not mention partitions of the state set of 
the “upper” transition system. The following corollary indicates that auxiliary 
sets can be efficiently represented as unions of bisimulation-equivalence classes. 

Corollary 1. Let Ti — {S\,Ei) and T2 = (S2,E2) be transition systems, let 
Pmax the coarsest self-stable KA-partition on Ti and T2, and let Q be the 
coarsest self-stable partition over 52- Then for any (B,X) E Pmax ctnd C E 
either X C C or X H C = 9 , 
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4 Computing the Relational Coarsest KA-Partition 

This section presents two approaches to constrncting the relational coarsest KA- 
partition on two systems. The first is based on the “naive” relational coarsest 
partition algorithm of [PT87], and we inclnde it here to illnstrate the basic opera- 
tions that both algorithms mnst perform. The second bnilds on the sophisticated 
“three-way splitting” approach also fonnd in [PT87]. 

4.1 A Naive Approach 

The naive algorithm nses a partition-refinement strategy. Starting with the coars- 
est possible KA-partition, KA-pairs are repeatedly “split” nntil the KA-partition 
becomes self-stable. The basic operation in the algorithm involves splitting the 
KA-pairs in a KA-partition P with respect to a KA-pair 



spUfia = 

1 PT= 0 

2 for every {B , X) E P do 

3 if BDEp {B') then 

4 if B Ep{B') then 

5 := {BnE:[\B'),X nE^\X')) 

6 {B2,X2) := {B-Ep{B'),X) 

7 P' -.= P'U{{Bi,Xi),{B2,X2)} 

8 else 

9 {Bi,Xi) := {B,Xr\Ep(X')) 

10 P' := P'U{{Bi,Xi)} 

11 else P' := P'U{(B,X)} 
return(T’^) 



The crncial insight nnderlying this operation ocenrs in lines 5-6. In this sitnation 
the kernel set, R, of KA-pair (B,X) mnst be split becanse it is not stable with 
respect to the kernel set of Cb Because states in Bi have transitions into 
the “auxiliary set”, Ai, of states that potentially simulate those in Bi must 
also have transitions into Ab The auxiliary set X 2 of B 2 does not have to satisfy 
this requirement, since states in B 2 do not have transitions into Rb Note that 
both (Ri, Ai) and (B 2 ,X 2 ) are stable with respect to Cb 

We call a splitter for P if P split{C\ P) (note this means that P is not 
stable with respect to C'). The naive algorithm works as follows. 



1 P := {(Ni,N2)} 

2 while P is not stable with respect to some C' = {B' , X') E P do 

3 P := split{C',P) 



It will be convenient in what follows to view the execution of our KA-partition 
algorithms in terms of a tree whose nodes are labeled with KA-pairs. A node has 
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children if the KA-pairs labeling the children are the resnlt of applying a split 
operation to the label of the node. Thns a node may have two children (if its 
kernel set is split in lines 5-6) or one child (if its kernel set remains nnchanged 
bnt its anxiliary set is pruned in line 9). If a node labeled by (B,X) has two 
children, we assnme the left child is labeled by {Bi , Xi) (line 6) and the right by 
{B 2 , X 2 ). An invariant in this tree is that the right child of every two-child node 
has the same anxiliary set as its parent. We refer to snch a tree as a partition tree. 
Note that the leaves of this tree represent the cnrrent KA-partition; when the 
algorithm terminates the leaves constitnte the coarsest self-stable KA-partition. 

The correctness of the naive algorithm relies on the following observations, 
which are adaptations of similar ones fonnd in [PT87]. 

1. If P' ^ P and P is stable with respect to C' , then so is P' . 

2. If P is self-stable, then P is stable with respect to any P^ such that P < P\ 

3. If P < P' then split[C' , P) < split{C' , P') for any C' . 

4. split is commutative: split[Ci, split[C 2 , Pj) = split[C 2 , split[Ci, Pj). 

To analyze the complexity of this procedure we hrst introduce the following 
notation. Let § refer to the set of bisimulation-equivalence classes of transition 
system (S', E); thus |S| represents the number of such equivalence classes. 

The loop in the naive algorithm executes at most |§i| • IS 2 I times, since 
each bisimulation-equivalence class has an auxiliary set that can only decrease 
IS 2 I times. Furthermore, each execution of the loop can be performed in |Pi| -h 
(|§i| • |p 2 |)- The first term counts the total amount of time over all splitters 
for updating kernel sets in the KA-pairs, while the second reflects the time for 
updating the auxiliary sets. In addition, only the current KA-partition needs to 
be stored. This leads to the following. 

Theorem 3. The naive algorithm computes the relational coarsest KA-partition 
in 0(|§i| • |S' 2 | • (|Pi| + (|Si| • 1 ^ 2 !))) time and 0(|§i| • |S' 2 |) space. 



4.2 An Improved Algorithm 

The algorithm just given uses the “naive” partition-refinement strategy of [PT87] 
as a basis for computing the coarsest self-stable KA-partition; it also makes no 
attempt to exploit bisimulation equivalence in the “upper” transition system. 
These observations suggest two avenues for an improved routine. 

1. Use the “three-way splitting” partition-refinement algorithm of [PT87]. 

2. Maintain equivalence classes of states in the auxiliary sets of KA-pairs. 

This section shows how these ideas may be combined into a single efficient pro- 
cedure. We begin by briefly reviewing the three-way splitting idea of [PT87]. 

Partition- refinement and three-way splitting. Paige and Tarjan [PT87] exploit 
HopcrofUs “process the smaller half” strategy for minimizing deterministic state 
machines [AHJ74] to give a more efficient algorithm for solving the RCPP. The 
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main idea is to split a partition with respect to two splitters by only processing 
the transitions entering the smaller of the two. This approach may split equiva- 
lence classes into three pieces, and we thus refer to it as “three-way splitting.’^ 

The key insight behind three-way splitting is as follows. Let T = (S', E), and 
consider block B in partition P of S. Suppose B is stable with respect to a set 
(former block) CCS, and suppose further that C has been split into C\ and C2 
and B must now be split with respect to these. If S C E~^[C), then splitting 
B with respect to both C\ and C2 yields (up to) three new equivalence classes. 

Bii = Bn E~^{Ci) n e~\C 2 ) 

5 i 2 = iBnE-^{Ci))-E-^{C2) 

B2 = {B - E~\Ci)) n E~\C2) 

Bii contains states from B having transitions into both C\ and C2, B12 contains 
states from B having transititions into Ci but not C2, while B2 contains states 
from B with no transitions into Ci but transitions into C2. Note that B — 
Bii U Bi 2 U B2 . 

Algorithmically, [PT 87 ] gives a way to compute Rn, B12 and B2 by scanning 
the smaller of C\ and C2. To achieve this one must know, for each state s in B, 
how many of transitions lead to states in C. One can then construct similar 

counts for each state in B and the smaller of C\ and C2 (call it Cgf^gll) by 
processing each transitions leading into That is, 

Bn = {seB\o< I^(s) n C3^3||| < \E(s) n C|} 

Ei 2 = {s c 5 1 0 = 1^(5) n Cgp^alll} 

B 2 = {seB\\E{s)nc,^^^^\ = \E{s)ncn 

To exploit this observation the three-way splitting algorithm maintains a list of 
compound splitters, which are trees of splitters with respect to whose roots the 
current partition is stable. In the previous example, C would be the root of a 
compound splitter, while C\ and C2 would be the children of C. When three-way 
splitting is done with respect to C\ and C2, Ci and C2 become the roots of new 
compound splitters if they have themselves been previously split. More details 
may be found in [PT 87 ,Fer 90 ]. 

Adapting three-way splitting to KA-partitions. To adapt three-way splitting to 
KA-partitions it is convenient to recall how our algorithms construct “partition 
trees” labeled by KA-pairs. As was the case in the previous algorithm, we main- 
tain the following invariant in this tree: the right child of a two-child node has 
the same auxiliary set as its parent. The leaves of the tree constitute the current 
KA-partition, and a compound splitter is a subtree with the property that the 
current KA-partition is stable with respect to the label of the subtree’s root. 

Let (B,X) be a KA-pair in the current KA-partition, and let C be the root 
node of compound splitter having two subtrees. Assume further that the KA- 
pair labeling C is {B',X') and that the label of CA left child, Ci, is (B[,X[) 
and that the label of its right child, (T2, is (R25 (Recall that the right child’s 
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auxiliary set is the same as its parents.) Then the result of splitting {B, X) with 
respect to both C\ and C2 will in general be the following. 

(5n, Vi) = (5 n E^\B[) n e:^\x)) 

(5i 2, Vi) = {{B n E^\B[)) - E^\B'^),X n E^\X)) 

(52, V) = {{B - 5ri(5()) n5ri(5'),V) 

The characterizations of the kernel sets ^11,^12 and B2 follows from the dis- 
cussion of the Paige-Tarjan algorithm above, but the associated auxiliary sets 
deserve further comment. Regarding (R2, recall that since is a node in the 
partition tree whose right child is (T2, the auxiliary sets labeling C and C2 are 
the same. Since (B,X) is stable with respect to C, it follows that every state 
in X has a transition into X' , the auxiliary set of C and hence of C2. Since B2 
consists of the states of B with no transitions into Ci, it follows that every state 
in X is a candidate for simulating every state in B2. 

On the other hand, states in B12 have transitions into C2 but not C\. Since 
the auxiliary set of C\ is a subset of the auxiliary set of C, not every state in 
X, the auxiliary set of R, can safely simulate states in B12'. only those with 
a transition into X^ [X H can. A similar line of reasoning holds for 

(Rii, X2). Note (suprisingly)that the auxiliary sets of Bn and B12 are the same. 

Figure 1 shows the resulting tree structure rooted at B. Node (Ri,Xi) is 
inserted so that the partition tree is binary; implicitly, Bi = Bn U Bn. Note 
that the invariant regarding right children is maintained. 



(R,X) 



(Ri,Xi) 



(R2,X) 



(RibXi) 



(Ri2,Xi) 



Fig. 1. Three-way splitting. 



An additional subtlety in KA-partitions is that compound splitters can have 
one child rather than two. This arises when a node’s auxiliary set is pruned 
without its kernel set being split. Such splitters can be treated as special cases 
of two-child splitters in which the right child’s kernel set is empty. A KA-pair 
split by such a splitter will only have one child, as its kernel set cannot be split. 

To implement these ideas efficiently we use several data structures. For each 
KA-pair D — (B,X) in the current partition (i.e. at the leaves of the partition 
tree) we use doubly-linked lists D.B and D.K to represent B and A, respectively. 
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This ensures constant-time insertions and deletions. Internal tree nodes do not 
have their kernel and auxiliary sets represented explicitly; rather, they may be 
reconstructed from the leaves that are descendants of the node. We also use the 
following data structure for efficiency reasons. 

Kernel count table. To each compound splitter C — we associate a 

hash table C.K that, for each state s ^ Si in the “lower” transition system, 
maintains \B' H T'i(s)| (i.e. the number of transitions from s into the kernel 
set of C). We use C.K{s) to stand for the count associated with state s ^ Si, 
In three-way splitting, it suffices to compute where C^small 

smaller child of C, in order to compute Bn, R 12 , B 2 and C\^\g.K, where C^big 
is the larger of CS children. 

Auxiliary count table. In analogy with C.K , C.A records, for each s G S '2 in 
the “upper” transition system, the quantity \C.X nT'2(5)|- So C.A{s) is the 
number of transitions s has into the set stored in C.X. 

Incoming node lists. For each potential compound splitter C — {B^ , X^), C.F 
records the list of KA-pairs whose kernel states have transitions to Rb This 
information is needed to ensure that auxiliary sets are refined properly when 
nodes are split with respect to C. In particular, if CS right child, (T 2 , has 
a smaller kernel set than its sibling Ci, and KA-pair D = (B,X) is such 
that B only has transitions into Ci, then the auxiliary set of DS (only) 
child, which would be A Pi will not be computed if only blocks 

with transitions into C 2 are analyzed. 

For leaf nodes C, C.X stores the auxiliary set associated with the node. For inter- 
nal nodes D, in contrast, we use D.X to store difference sets. More specifically, 
rather than storing the entire auxiliary set of the KA-pair {B, X) associated with 
D in D.X , we store only those elements of the set that are not in the auxiliary set 
of DS left child. Let Di be the left child of D, and let {Bi, Xi) be Dfs KA-pair. 
Then the set of states stored in D.X is X — Xi. When doing three-way splitting 
on KA-pair (B,X) with respect to compound splitter C = (B^,X^) whose left 
child Cl is labeled (B[,X[), C.X can be used to compute the auxiliary set Xi 
of (Bii,Xi) using the following identity. 

X n Ei\x') =x-{sex\ E2 {s) c {x' - x;)} (i) 

The use of difference sets has efficiency ramifications; in particular, the amortized 
analysis of the complexity of the algorithm relies on the use of difference sets. 

Special care must be taken for partition-tree nodes having only one child. 
Since only leaves store KA-pairs, calculating the kernel set of an internal node 
D requires gathering all the kernel sets of the leaves in DS subtree. In the Paige- 
Tarjan algorithm [PT87], this may be done in time proportional to the size of 
DS kernel set, since every internal node has two children and kernel sets are 
disjoint. Because of the existence of single-child chains in our tree, this does not 
immediately apply. To solve this problem, we use path compression: we add a 
field D.root that points to the first node on a single-child chain that D may be 
part of. For nodes that are the roots of such chains, we add an additional field, 
D.end, that points to the end of its single-child chain. 
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Bisimulation equivalence in auxiliary sets. The second direction for improving 
the algorithm involves the exploitation of bisimnlation eqnivalence classes in 
anxiliary sets. The basic approach is to maintain a cnrrent partition for states 
in the “npper” transition system, T 2 = (S 2 ,E 2 ). Each block represents an ap- 
proximation to the bisimnlation eqnivalence classes of T 2 . The fields D.X then 
point to lists of these eqnivalence classes rather than to states. 

More specifically, we nse auxiliary list tables (ALTs) to store anxiliary sets. 
An ALT has two kinds of entries. 

Base entries point to lists of states in the npper transition system. Taken to- 
gether, the base entries form a partition of the state space. 

List entries point to lists of base entries. These entries will in tnrn be pointed 

to by the anxiliary set components C.X of a node C in the partition tree. 

The lists in ALT are implemented as donbly-linked lists in order to snpport 0(1) 
insertions and deletions. In addition, for s E S 2 , baseOf{s) retrieves the base 
entry s belongs to; this can be implemented in constant time by maintaining an 
array storing each anxiliary state’s base entry b and position in the state list of 
6. We also associate with each base entry b a held b,t, which is nsed for splitting 
6, and a hash table b.R indexed by the list entries it belongs to; b.R stores the 
positions of b in these list entries so that b can be qnickly deleted. Together 
baseOf and b.R allow the qnery s E 17, where I is a list entry, to be answered in 
0(1) time: hrst look np the base entry b that s belongs to, then look in b.R to see 
if there is an entry for 1 . mkListEntry{l) is an initialization fnnction; it creates 
a list entry for a set of states by hrst creating a new base entry for this set and 
then a new list entry containing only this base entry. addBaseToList{b , 1) adds a 
base entry 6 to a list entry I and saves 6’s position in the donbly-linked list of I to 
b.R. removeBaseFromList{b,l) removes b from I and adjnsts b.R accordingly. 
duplicate{l) retnrns a new list entry whose donbly-linked list contains the same 
bases as 1. 

Dnring the execntion of the algorithm base entries will periodically reqnire 
splitting, since states in the same base entry may be determined not to be bisim- 
nlation eqnivalent. For example, this happens when anxiliary sets are “prnned” 
dnring three-way splitting: some states in a base entry b may be determined 
to have transitions to a given anxiliary set (which may be shown always to 
be a union of bisimulation equivalence classes in the upper system), while oth- 
ers do not. In this case the former states are moved to b.t. Then operation 
processSplitBases splits bases whose b.t list is non-empty; such base entries are 
called split bases. For a given split base b, processSplitBases creates a new base 
entry b^ for the states in b.t if b itself is non-empty and moves b.t to b' . The 
procedure then updates list entries appropriately: it takes another parameter, a 
list of pairs of list entries, with the first list of each pair representing an “old 
home” of b and the second representing the “new home” for b' . (In general, the 
former list will be the auxiliary list of a node and the later an auxiliary list of 
a left child. The former should be turned into a difference list, while the latter 
is expecting to be populated with base entries.) No pair shares the same “old 
list” component, so this list of pairs can be organized as hash table, enabling 
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membership queries to be done in 0(1) time. For all other list entries that are 
not in an “old list” , the routine adds the new base entry to those already 
containing b so that the states they contain remain unchanged. 

The algorithm in detail. Our algorithm computes the Relational Coarsest KA- 
Partition from Ti = to T2 = {S2,E2) in several stages. It starts by 

building a KA-partition containing one KA-pair, (S'i,S'2): every state in S2 is 
assumed to simulate every state in and all states in and S2 are assumed 
to be bisimulation equivalent. 

The first step in the algorithm is to stabilize the KA-partition with respect 
to the single KA-pair (S'i,S'2). After creating a node C and initializing C.B to 

and C.X to S2, C.B is split into states having transitions and those that 
do not; the former are assigned to Ci.B, where C\ is the left child of C, while 
the latter are assigned to C2.5, the right child of C. The counters in C.K are 
also initialized to the number of transitions each state has (except that states 
without transitions are not touched). Then the auxiliary list of C is copied into 
C2.A, and Ci.X computed by scanning the transitions leading into C.X, This 
procedure may also induce a split in the base entry containing 82, since states 
without transitions cannot be bisimilar to those that do. The latter states are 
assigned to a base entry that becomes part of Ci.X, while those that do not 
become the elements of C.X , which is now a difference list. At the end, there is 
a single compound splitter, C, with left child C\ and right child C2. 

The algorithm then loops, repeatedly removing splitters from a list of split- 
ters, performing the split, and potentially adding new splitters, until the list of 
splitters is empty. Given a (compound) splitter G, the kernel sets of the current 
partition (leaves in the partition tree) are split by processing the child con- 
taining the smaller splitter. This entails decrementing C.K{s) and incrementing 
Gsmall*^^"(^)5 where is the smaller child of C. Then each KA-pair D that 

is touched in this process is examined and split using three-way splitting. Tem- 
porary fields D.Bq and D.Bi are used for this purpose. 

Following the three-way splitting operations, the auxiliary sets for KA-pairs 
with transitions into C are created. Right children are given copies of the auxil- 
iary sets of their parents by copying list entries in the ALT, and base entries are 
split when some states are determined to have transitions into some sets that 
others cannot match. Finally, splitter lists are updated; C\ and C2 are added as 
compound splitters if they have children, as well as other nodes that were split 
and yet were not part of any splitter. 

Theorem 4 . The algorithm converges^ and the leaf KA-pairs form the relational 
coarsest K A- partition when the algorithm terminates. 

The next theorems characterize the complexity of our algorithm. Recall that 
for transition system T = (S', G), S refers to the set of bisimulation equivalence 
classes T. We also use £ to refer to the transition relation on § defined by: 
(fi, C) G £ if and only if there exist si G hi, ^2 G /i2 such that E{si, S2). 
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Theorem 5. The overall running time of the three-way splitting algorithm with 
path compression is 0(|§i| • |§ 2 | + |^i| * ^og{\Si |) + |§i| • \E 2 \ + |£i| * IS 2 I)* 



Theorem 6. The space reguired by the three-way splitting algorithm is bounded 

byO{\n\ + \T 2 \+\§i\-\§ 2 \). 

We conclude this section by comparing our time and space efficiency with the 
simulation algorithm in [BP95]. That procedure ran in OdS'ij • |S' 2 | + |S'i| • |T' 2 | + 
152 1 • IT”!!) time and 0(|Ti| + IT 2 I + |5i| • 1^2 1) space. Our complexity results 
replace many occurences of Ei and Si with and indeed, the only worst- 
case penalty our procedure pays is the \Ei\ • log{\Si\) factor, which is due to 
the three-way splitting our procedure performs. The experimental results in the 
next section nevertheless indicate that our procedure works better in practice. 

5 Experimental Results 

To assess the practical performance of our algorithm we implemented it in the 
Concurrency Workbench of the New Century (CWB-NC), a verification tool for 
finite-state systems (see www . cs . sunysb . edu/“cwb to obtain the system). The 
CWB-NC analysis routines work on labeled transition systems, so we adapted 
our algorithm to this setting by adding an action parameter to the splitting op- 
eration and then splitting a partition with respect to all actions, given a splitter. 
The approach followed is similar to that presented in [Fer90] for adapting the 
Paige-Tarjan algorithm [PT87] to labeled transition systems. The implementa- 
tion of our algorithm involves 2,045 lines of Standard ML of New Jersey, with 
approximately a quarter of this total being devoted to maintaining ALT tables. 

We then tested four different simulation algorithms on case studies included 
in the CWB-NC release. The four simulation algorithms checked included: the 
implementation of the Bloom-Paige algorithm [BP95,CC95] included in the CWB- 
NC release; our naive algorithm; our algorithm with the ALT data structure but 
without path compression; and our full algorithm. In all cases early termina- 
tion is used: when two start states are found to be unrelated, the algorithm 
terminates. We ran the implementations on two different classes of systems. 

Railway-signaling schemes. Three models of the British Rail Slow-Scan com- 
munications protocol as modeled in [CLNS96] were compared to each other. 
The systems are implemented in a version of CCS with priorities. 
Alternating-bit protocols. Different versions of the alternating-bit protocol 
were compared, including ones that deadlocked and chains of cells. 

All testing was done on a Sun Ultra SparcIIi with two 336 MHz processors and 
3 GB of main memory. All times are reported in seconds of CPU time. 

The results for the railway models are reported in Table 2, while those for the 
alternating bit protocol may be found in Table 4. The columns headed “Bloom- 
Paige” present times for the Bloom-Paige algorithm, “Naive KA-part” for our 
naive algorithm, “Sim-ALT” for our more sophisticated algorithm without path 
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compression, and “Sim-ALT-PC” for our algorithm with path compression. We 
also compared the performance of Sim-ALT and the Bloom-Paige algorithm 
when the systems are first minimized with respect to strong bisimulation. Ta- 
bles 1 and 3 give the sizes of the systems before and after minimization. In all case 
states / ^ trans” refers to the number of reachable states and transitions. 



Table 1. Railway system sizes before and after minimization. 





^ states 


^ trans 


^ bisim classes 


^ of bisim trans 


basicSS 


312 


801 


287 


713 


recovery SS 


1100 


2801 


789 


2233 


ftolerantSS 


11905 


33760 


7485 


26165 



Table 2. Railway simulation results. 



Agent 1 
Agent 2 


ans 


Bloom- 

Paige 


min -h 
Bloom-Paige 


Naive 
K A- Part 


Sim-ALT 


Sim-ALT 

-PC 


min -h 
Sim-ALT 


basicSS 
recovery SS 


T 


124.74 


7.14-h 

62.91 


24.22 


7.52 


9.62 


7.14-h 

2.44 


basicSS 

ftolerantSS 


F 


N/A^ 


131.48-h 

2109.90 


330.78 


139.80 


139.02 


131.48-h 

26.63 


recovery SS 
ftolerantSS 


F 


N/A^ 


137.21 + 

n/a'- 


634.10 


278.05 


273.99 


137.21 -h 
32.15 


recovery SS 
basicSS 


F 


186.23 


7.14 -h 
157.28 


28.67 


14.95 


13.60 


7.14 -h 
1.39 


ftolerantSS 

basicSS 


F 


10831.63 


131.48 -h 
1724.00 


284.85 


192.50 


194.12 


131.48 -h 
23.99 


ftolerantSS 
recovery SS 


F 


N/A^ 


137.21 + 
31104.24 


192.95 


256.73 


267.59 


137.21 + 
17.91 



1. Memory allocation error after > 4 hour 



Based on the times presented one may draw the following conclusions. 

1. Our algorithms dramatically outperform the Bloom-Paige algorithm in time 
and space. Even the naive algorithm substantially outperforms Bloom-Paige. 
The degrees of improvement are often quite startling, ranging up to a factor 
of 100 and beyond; we believe this is due to the space efficiency of our 
algorithms, which causes them to use less virtual memory. 

2. When there are few eguivalence classes^ minimizing and then running Bloom- 
Paige can be competitive with our algorithms running on unminimized sys- 
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Table 3. ABP system sizes before and afterminimization. 





^ states 


^ trans 


^ bisim classes 


^ bisim trans 


ABP-lossy 


57 


130 


15 


32 


ABP-safe 


49 


74 


18 


32 


T wo-link-net w 


1589 


6819 


197 


791 


Three- link- netw 


44431 


280456 


2745 


16188 


T wo-link-net w-safe 


1171 


3153 


196 


662 



Table 4. ABP simnlation resnlts. 



Agent 1 
Agent 2 


ans 


Bloom- 

Paige 


min-h 

Bloom- 

Paige 


Naive 
K A- Part 


Sim-ALT 


Sim-ALT 

-PC 


min-h 

Sim-ALT 


ABP-lossy 
T wo-link-net w 


F 


11.07 


2.00-h 

0.20 


4.21 


2.78 


5.57 


2.00-h 0.12 


ABP-lossy 

Three-link-netw 


F 


7104.53 


89.91-h 

14.60 


185.00 


443.48 


476.82 


89.91-h 

2.66 


T wo-link-net w 
Three-link-netw 


F 


N/A 


91.81-h 

232.10 


4320.63 


662.97 


625.84 


91.81-h 

6.71 


T wo-link-net w 
ABP-lossy 


F 


18.34 


2.00-h 

0.24 


5.52 


6.31 


5.30 


2.00-h 

0.13 


Three-link-netw 

ABP-lossy 


F 


1116.19 


89.91-h 

5.79 


231.63 


137.63 


135.59 


89.91-h 

2.31 


Three-link-netw 
T wo-link-net w 


F 


N/A^ 


91.81-h 

160.25 


4966.73 


473.28 


561.37 


91.81-h 

88.47 


ABP-safe 

ABP-lossy 


T 


0.16 


0.08 -h 
0.02 


0.20 


0.09 


0.11 


0.08-h 

0.02 


T wo-link-net w-safe 
T wo-link-net w 


T 


432.58 


1.99 -h 
3.82 


84.27 


5.60 


5.68 


1.99 -h 
0.95 



1. Memory allocation error after > 1 hour 



terns. The ABP results suggest this in particular: minimization can dramat- 
ically improve the performance of Bloom-Paige. This also leads us to believe 
that paging is a major source of inefficiency in Bloom-Paige. 

3. Sim- ALT substantially outperforms Bloom-Paige when both are run on min- 
imized systems. This result may seem surprising, given that our algorithm 
is intended to combine the benefits of minimization with those of simulation 
checking. Howevever, as Theorem 5 shows, our algorithm’s time complexity 
still contains factors involving the number of transitions in the input systems. 

4. Path compression is a net loss for our algorithm. In order to obtain the com- 
plexity result in Theorem 5 it was necessary to introduce path compression. 
However, the performance figures suggest that this improvement does not 
materialize in practice. 
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6 Conclusions and Future Work 

This paper has presented an algorithm for determining whether or not one tran- 
sition system can simnlate another. The procednre combines ideas from tra- 
ditional simnlation algorithms with notions fonnd in bisimnlation-eqnivalence 
procednres; the resnlting rontine has asymptotic time- and space-complexities 
that approach those of the best-known algorithm [BP95,CC95]. In practice, onr 
approach dramatically ontperforms the existing rontines, owing to the fact that 
onr procednre exploits bisimnlation eqnivalence to rednee both time and space 
consnmption. 

As fntnre work we plan to investigate the nse of onr ideas to improve mn- 
calcnlns model checking. It is known that bisimilar systems satisfy the same 
mu-calculus formulas; consequently, combining a bisimnlation-eqnivalence algo- 
rithm with a model checker could also yield potentially dramatic performance 
improvements in practice. We also wish to investigate adaptations of our al- 
gorithm in the computation of other relations, including the so-called “weak^^ 
simulation ordering in which transitions labeled by internal actions are allowed 
to be “absorbed” into transitions labeled by external actions. It should also be 
noted that our algorithm is global: the transition system must be built before 
the routine may be run. It would be interesting to investigate combining our 
ideas with on-the-fly approaches to system minimization in order to avoid the a 
priori construction of the system state spaces [BFH+92]. 

Related work. Bloom [Blo89] proposed an algorithm for ready simulation that 
runs in 0{{\Ei \ -h \E2\) * (|5'i| + 15^2 1)®) time. Bloom and Paige improved this result 
to 0(|5i| • IT 2 I + |52| • |Ti|) in [BP95]; similar ideas may also be found in [CS90], 
where preorder-checking is reduced to model checking, and in [CC95,HHK95]. 
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Abstract. Message sequence charts (MSCs) is a standard notation for 
describing the interaction between communicating objects. It is popu- 
lar among the designers of communication protocols. MSCs enjoy both 
a visual and a textual representation. High level MSCs (HMSCs) allow 
specifying inhnite scenarios and different choices. Specihcally, an HMSC 
consists of a graph, where each node is a hnite MSC with matched send 
and receive events, and vice versa. In this paper we demonstrate a weak- 
ness of HMSCs, which disallows one to model certain interactions. We 
will show, by means of an example, that some simple hnite state and sim- 
ple communication protocol cannot be represented using HMSCs. We 
then propose an extension to the MSC standard, which allows HMSC 
nodes to include unmatched messages. The corresponding graph nota- 
tion will be called HCMSC, which stands for High level Compositional 
Message Sequence Charts. With the extended framework, we provide 
an algorithm for automatically constructing an MSC representation for 
hnite state asynchronous message passing protocols. 



1 Introduction 

Visual notations are useful in the design of large and complicated systems. They 
allow a more intuitive understanding of the behavior of the system and the re- 
lation between its components. They often allow abstracting away parts of the 
system that are less relevant for a particular view. Message sequence charts are 
among the most frequently used formalism for designing communication proto- 
cols. Recently, they have been also used in the development of object oriented 
systems, e.g. in UML. In the recent years, we observe the development of a 
growing number of tools and algorithms for the manipulation of MSC based 
designs [1,2,3,7,11,12]. 

The standard visual and textual notation [9] by ITU allows representing a 
single execution scenario, as well as a collection of scenarios, including choices 
and repetition. This is achieved by a notation called HMSC (High Level Message 
Sequence Chart), which consists of a graph, where each node contains a single 
MSC. The system behavior can follow the paths on that graph, starting from 
some initial node. In this paper we show, by means of an example, a limitation 
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of HMSCs. This limitation stems from the constraint that each MSC node in an 
HMSC must have only matched send and receive events, i.e., each MSC must be 
labeled by message arrows. We show examples where one cannot break a possi- 
bly infinite computation of a finite state system into finitely many nodes with 
matched communication events. (A finite execution can always be represented 
as a single node.) We demonstrate that such undecomposable behaviors are not 
merely a theoretical result, but can represent the execution of real protocols. 

To circumvent the problem, we suggest an extension to the MSC standard, 
titled compositional message sequence charts (CMSC and HCMSC). This exten- 
sion allows specifying MSCs with unmatched sends and receives. The semantics 
of the new construct prescribes how to combine such MSC nodes together. We 
use the extended notation to suggest an algorithm for the automatic generation 
of HCMSC representations for finite state systems. We show that basic prop- 
erties of HCMSCs become undecidable, e.g. the question whether a message 
will be received in at least one computation. We propose to use a restriction of 
HCMSCs, called realizable HCMSCs. We show how to test whether an HCMSC 
is realizable in an efficient way. The notion of realizable HCMSC is quite natu- 
ral, as our algorithm for the HCMSC generation already yields HCMSCs of this 
kind. 

The deficiency of the original MSCs was also recognized in [10]. The solution 
suggested there is a different extension to HMSCs. According to this extension, 
one can use parallel components of MSCs, and allow intercommunication be- 
tween them, using a mechanism called ^gatesk Our solution differs from that 
of [10], as we study the effect of allowing communication between sequentially 
composed CMSCs. That is, a communication that starts in one CMSC and ends 
in a subsequent one. Notice that our solution is more canonical, since it does 
not make use of special message names for the purpose of binding by name iden- 
tifiers, as in [10]. Further papers considering this issue are [8,13]. These papers 
look at the problem of checking whether a finite state protocol can be translated 
into an HMSC. In the first of these papers, it is shown that this question is 
decidable, whereas the second paper shows that for a natural class of finite state 
protocols one can efficiently check whether the translation into an equivalent 
HMSC is possible. 

2 Preliminaries 

Each MSC describes a scenario where some processes communicate with each 
other. Such a scenario includes a description of the messages sent, messages re- 
ceived, the local events, and the ordering between them. In the visual description 
of MSCs, each process is represented as a vertical line, while a message is repre- 
sented by a horizontal or slanted arrow from the sending process to the receiving 
one, as in Figure 1. The corresponding ITU Z120 textual representation of the 
MSC appears on the right side part of Figure 1. 



Definition 1. An MSC M is a tuple (U, <, "P, W, T, T, TV, m). 
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msc MSC; 

inst PI : process Root , 
P2 : process Root , 
P3: process Root; 
instance PI ; 

out Ml to P2; 
in M5 from P2; 
in M6 from P3; 
endinstance ; 
instance P2 ; 

in Ml from PI ; 
out M2 to P3; 
out M3 to P3; 
in M4 from P3; 
out M5 to PI; 
endinstance ; 
instance P3 ; 

in M2 from P2; 
in M3 from P2; 
out M4 to P2; 
out M6 to PI; 
endinstance ; 
endmsc ; 



Fig. 1. Visual and textual representation of an MSC 



— V IS a (fimte or infinite) set of events^ 

— < C V X V IS an acyclic relation^ 

— V IS a set of processes, 

— J\f IS a set o/ message names, 

— L : V ^ V IS a mapping that associates each event with a process, 

— T : V ^ {s, r, 1} is a mapping that describes each event as send, receive or 

local, respectively, 

— N : V ^ J\f maps every event to a name, 

— m <1 V X V IS a partial function called matching that pairs up send and 
receive events. Each send is paired up with exactly one receive and vice versa. 
Events vi and V2 can be paired up with each other, only if N{vi) = N{v2)^ 

A message consists of a pair of matched send and receive events, Eor two events 
e and f, we have e < f if and only if one of the following holds: 

— e and f are a matching send and receive events, respectively, 

— e and f belong to the same process P, with e appearing before f on the 

process line. 

We assume fifo (first in first out) message passing, i,e,, 

(T(ei) = T(e2) = s A T(/i) = T(/2) = r A m(ei,/i) A m(e2,/2)A 
t(ei) = L(e^) A L(h) = L{f2) A 7V(ei) = #(62) A ei < 62) ^ /i < /2 

An MSC with a finite (an infinite, respectively) set of events is called a finite 
(infinite, respectively) MSC, 

Denote by e — > f the fact that e < / and either e and / are a matching 
send and receive events, or e and / belong to the same process and there is no 
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event between e and / on some process line. That is, e immediately precedes 
/. The transitive closure of the relation < is a partial order called the visual 
ordering of events and it is obtained from the syntactical representation of the 
chart (e.g. represented according to the standard syntax ITU-Z120 [9]). Clearly, 
the visual ordering can be defined equivalently as the transitive closure of the 
relation — A linearization of an MSC M = {V, <,V, Af, L,T, N, m) is a total 
order on C, which extends the relation (C, <). 

Example 1. Let us denote in the example MSC given in Figure 1 by e* the 
send event and by fi the receive event of message Mi, 1 < i < 6 . Then we 
have C = V = {Pl,P2,P3}, M = {M1,...,M6} and 

N {ei) — N{fi) — Mi for all i. The events located on PI are {ei, fe} = 
with T{ei) = s, ^(/b) = T{fe) = r, and ei < /s < fe. This ordering 
is the time ordering of events on PI. We also have m(ef , fi) and < fi for all i 
(message ordering). In particular, ei < /i < 62 < /2 and ei is the minimal event 
of the MSC w.r.t. visual ordering. 

A type is a triple [i,j, C), including two processes Pi and Pj, and a message 
name C G A". Each send or receive event has a type, according to the origin and 
destination of the message, and the label of the message. Matching events have 
the same type. 

The partial order between the send and receive events of Figure 1 is shown 
in Figure 2. In this figure, only the immediately precedes’ order — ^ is shown. 
Notice for example that the send events of the two messages, M5 and M6, are 
unordered. 

Definition 2. The concatenation of two MSCs Mi = (Vi, <i,P, Ai, Ti, Ti, 
TVi, mi) and M 2 — (Vi, < 2 ,P, Ai, T 2 , TV, A 2 , m 2 ) over the same set of pro- 
cesses V and disjoint sets of events Vi Pi Vi = 0 (we can always rename events so 
that the sets become disjoint), denoted Mi M 2 , is (Vi U Vi, <, V, Ai U Ai, Li U 
T 2 , T 1 UT 2 , NiU N 2 , miU m 2 ), where 

< = <1 U <2 U {(p, q) I Li{p) = L 2 {q) A p G Vi A g' G Vi} . 

That is, the events of Mi precede the events of M 2 for each process, respec- 
tively. If M = Ml M 2 , we say that Mi is a prefix of M. Notice that there is 
no synchronization of the different processes when moving from one node to the 
other. Hence, it is possible that one process is still involved in some actions of 
one node, while another process has advanced to a different node. The infinite 
concatenation of finite MSCs is defined in a similar way. 

Definition 3. Let Mi, M 2 , . . . , . . . he an infinite sequence of finite MSCs, Mi = 
(Vi, <, V, Mi, Li, Ti, Ni, mi). Then the infinite concatenation Mi M 2 ... is de- 
fined as the MSC {V,<, V, M, L, T, N, m) where C = Ui>iVi is the disjoint 
union of the Vi, M — CiyiMi, L\v, — Li, T\y, — Ti, N\y^ = Ni, m = Ui>imi 
and 

< = U {(P, q) I Li{p) = Lj{q) A p e Vi A q e Vj Ai < j} . 

i>l 
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Fig. 2. The partial order between the events of the MSC in Fignre 1. 



Since a commnnication system nsnally inclndes many (or even infinitely 
many) snch scenarios, a high level description is needed for combining them 
together. The standard description consists of a graph called HMSC (high level 
MSC), where each node contains one MSC as in Fignre 3. Each maximal path 
in this graph (i.e., a path that is either infinite or ends with a node withont ont- 
going edges) that starts from a designated initial state corresponds to a single 
execution or scenario. Such an execution can be used to denote the commu- 
nication structure of a typical (aka Aunny day’) or an exceptional (aka Tainy 
day’) behavior of a system, or a counterexample found during testing or model 
checking. 

Definition 4. An HMSC N is a ^-tuple (tS,r, sq,c) where S is a finite set of 
states^ each labeled by some finite MSC over the same set of processes, and with 
sets of events disjoint from one another. The mapping c associates the state s 
with an MSC c(s). By r C S xS we denote the edge relation and the initial state 
IS sq G S. An execution of N is a (finite or infinite) MSC c(sq) c(si) 0 (^ 2 ) . . . 
associated with a maximal path of N that starts with the initial state sq. 

Figure 3 shows an example of an HMSC where the node in the upper left 
corner is the starting node. The executions of this system are either finite or 
infinite. Note that according to HMSC semantics, process P2 in Figure 3 may 
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send its Report message after process PI has progressed into the next node and 
has sent its Req_service message. 




Fig. 3. An HMSC graph 



3 MSC Decomposition 

The HMSC model combines the visual notation of message sequence charts with 
the ability to describe repetitions and alternative computations. In this section 
we will show that this, seemingly powerful model, cannot describe some basic 
finite state communication protocols. The main problem lies within the require- 
ment that the send and receive events in each node must be matched. 

We want to exemplify that there are finite state protocols that do not allow 
a finite HMSC representation. To do that, we show an infinite execution ^ of 
a finite state protocols with the following property: There is no way to write ^ 
as an infinite concatenation of finite MSCs. Given the above property, it is not 
possible to construct an HMSC such that ^ would correspond to a traversal of 
one of the HMSC paths. Thus, we cannot represent such a system using HMSCs. 

As an example, consider the infinite MSC whose prefix appears in Figure 4. 
We assume that PI repeatedly sends a message m to P2, and P2 repeatedly 
sends to PI. We omit the message labels m, below. We can model for 
example each of the processes PI and P2 by a finite state machine. Here, PI 
starts by sending twice message m to P2, then he alternates between receiving 
from P2 and sending back m to P2. Process P2 alternates between sending 
to PI and receiving m from PI. We show that this infinite MSC cannot be 
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decomposed into a product of finite MSCs. We start with the send event ei and 
receive fi. Obviously, because of the compulsory matching in HMSCs, they 

must belong to the same MSC node. We have the send event gi preceding /i, 
on the same process line, while its corresponding receive event h\ succeeds the 
send Cl. Thus, the events g\ and h\ must be in the same node with e\ and f\. 
For the same reason, we have that 62 and /2 must belong to the same node with 
^ 1 , and hi, and so forth. 




Fig. 4. A prefix of an MSC execution that cannot be decomposed . 



While the repeated crossing of message edges seems to be untypical for MSCs, 
the above behavior ^ describes a possible execution of an actual protocol [15], 
where messages and acknowledgments are being sent between two processes, 
with (bounded) buffering. 

4 Compositional MSCs 

In order to represent communication protocols, whose description could only be 
approximated using standard MSCs, we suggest an extension of the MSC stan- 
dard. Intuitively, a compositional MSC^ or CMSC, may include send events that 
are not matched by corresponding receive events and vice versa. An unmatched 
send event may be matched in future HCMSC nodes (on some path). Simi- 
larly, an unmatched receive event may be matched in previous HCMSC nodes. 
The definition of a CMSC is hence similar to an MSC, except that unmatched 
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send and receive messages are allowed. (For its similarity to Definition 1, we will 
omit repeating the formal definition with the corresponding change.) 

We denote an unmatched send by a message arrow, where the receive end (the 
target of the arrow) appears within an empty circle. Similarly, an unmatched 
receive is denoted by an arrow where the send part (the source of the arrow) 
appears within a circle. CMSC arrows where both the send and the receive 
are unmatched events are forbidden. Moreover, we also disallow an unmatched 
receive event to be followed by a matched receive event of the same type in the 
same CMSC node. Similarly, we disallow an unmatched send event to be preceded 
by a matched send event of the same type in the same CMSC node. In Figure 5, 
we can see an HCMSC that represents the execution that is approximated in 
Figure 4. 




Fig. 5 . A decomposition of the execution in Figure 4. 



Before defining the concatenation of CMSCs let us denote a CMSC as left- 
closed, if it does not contain unmatched receive events. 

Definition 5 . The concatenation M1M2 of two CMSCs Mi = (Vi, < 1 , V, Ai, 
Li, Ti, Ni, mi) and M2 — (V2, <2, V, A2, C2, T2, #2, ^2) over disjoint events 
sets, IS defined when the following conditions hold: 



1 . Ml IS left- closed. 
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2 . For any type the number of unmatched receive events of type t in M2 is at 
most equal to the number of unmatched send events of type t in M\. 

3 . If M2 contains a matched send event of type t then the number of unmatched 
receive events of type t in Mi is equal to the number of unmatched send events 
of type t in Mi . 

Define a matchinq function m that pairs up unmatched send events of Mi with 
unmatched receive events of M2 accordinq to their order on their process lines. 
That is^ the ith unmatched send in Mi is paired up with the ith unmatched 
receive event of the same type in M2. Notice that the function m is uniquely 
defined. 

The concatenation Mi M2 is then defined as {Vi U V2, Afi U A2, U 

A2, Ti U T2, #1 U #2, G 7712 G m), where 

< = <1 u <2 u{{p,q) I L{p) = L{q) Ap G U Ag G 1^2} U {{p,q) \ {p,q) G m}. 

It is easy to see that a concatenation always resnlts in a left-closed CMSC. More- 
over, if Ml and M2 both satisfy the fifo restriction, then Mi M2 also does. This 
follows from the last reqnirement in the definition. Note that this reqnirement 
is consistent with onr fifo definition, which applies only to messages with the 
same name. Thns, if we reqnire instead that the fifo condition is satisfied by all 
messages from one process to another means that we have to modify the last 
reqnirement of the definition of the concatenation accordingly. Infinite concate- 
nation and HMSCs are defined in an analogons way to Definitions 3 , 4 . 

5 Undecidability 

Extending the MSC standard allows representing the execntion of a bigger class 
of protocols than what is allowed by the ITU standard. However, nnsnrprisingly, 
with the added expressiveness we loose some of the power of analyzing snch 
systems. 

Unlike simple HMSC, where some simple properties can be checked, see 
e.g., [ 12 ], in HCMSC one cannot decide even the trivial property of whether 
a particnlar message can be sent or received in at least one compntation. The 
undecidability proof will be a reduction from Post Correspondence Problem 
(PCP). An instance of PCP is a set of pairs of words 

C = Wi), (V 2 ,W 2 ), ■■■, (VmjWm)} 

over some mutual alphabet T. We want to find out if there is some integer 
n > 0 and some sequence of indexes iiU2,---Un such that Vi^Vi^...Vi^ = 
Wi^Wi^ . . .Wi^. We require in addition that the PCP solution is such that in = 1 . 
This is not a restriction, since we can use a suitable encoding so that whenever 
Wi^Wi^ . . .Wi^_^wi is a prefix ofvi^Vi^ . . .Vi^_^vi, then these two words are equal. 
We need this variant of PCP for technical reasons which will become clear in 
the proof below. 

We will construct a HCMSC with five processes Pi to P’s, and with CMSC 
nodes Pi , T2, . . . , Tm , T [ , T'2 , • • • , , P, Pb 
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— Messages from Pi to P2 correspond to the letters of 17 . Each CMSC Ei 
contains a seqnence of nnmatched send events from Pi to P2, representing 
the seqnence of messages of Vi. Each CMSC P/ contains a seqnence of nn- 
matched receive events from Pi to P2, representing the seqnence of messages 
of Wi . 

— Messages from P3 to P4 correspond to the index of the PCP word being 
sent. Each CMSC E{ contains also a single nnmatched send from P3 to P4 
representing the cnrrent index i. Each CMSC P/ contains the corresponding 
nnmatched receive event. 

The HCMSC N has the form P(Pi-h* • - + P;^)* (P^ + • • ^^E^^yE[E^. That is, N 
starts at some initial node P, which contains only one nnmatched send from Pi 
to P5. Then one can repeatedly take nodes of the form P*, any nnmber of times. 
Then one can take any number of nodes of the form P^-, followed by the nodes P[, 
Pb The sink node P^ contains a message from P2 to P5, then a message from 
P4 to P5 and finally an unmatched receive (matching the send from node P) 
from Pi to P5. Notice that whenever the message from Pi to P5 is received, the 
sequence rcq • • 'Wi^ corresponding to the unmatched send events on the path 
in TV is a prefix of corresponding to the unmatched receive events, 

and in — 1 . Under our assumption about PCP words this means equality, i.e., 
Vi^ . . .Vi^ — tcq . . and we obtained a solution. Notice that we might have 

unmatched sends on Pi and P3 in the CMSC associated with the path in TV. This 
explains why we obtain only the prefix relation and why we need the particular 
PCP encoding. Thus, the message from Pi to P5 is received if and only if there 
is a nonempty solution to the PCP instance. 

6 Realizable HCMSCs 

The way we defined HCMSCs makes that not all executions correspond to CMSC 
scenarios. We define realizable HCMSCs, a subclass where all maximal execu- 
tions define left-closed CMSC. Note that we explicitely allow executions with 
unmatched send events. Eor example, the HCMSC of Eigure 5 is such that ev- 
ery finite execution is a left-closed CMSC with unmatched sends. However, the 
(unique) infinite execution corresponds to an infinite MSC. 

Definition 6. An HCMSC is realizable if the execution of every finite path 
starting with the initial state is a left-closed CMSC, 

We will show that one can efficiently test whether an HCMSC is realizable. 
Consider the CMSC M — c(so)c(5i) • • *c(5;^) associated with a finite path x = 
So, si, . . . , of the HCMSC TV with initial state sq. Let t be a type, then the 
t -deficit Dt{x) of X is the difference between the number of send events and 
the number of receive events of type t in x- A necessary condition for TV to be 
realizable is that Dt{x) N 0 for every loop x and every type t. More generally, 
an HCMSC TV = (N,r, sq,c) is realizable if and only if every node s which is 
accessible from the initial node satisfies the following conditions. Assume that 
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node 5 contains x unmatched receives of type t. Then Dt{x) > ^ for all paths 
X from sq to s' with G r. Moreover, if node s also contains a matched 

send of type t, then Dt{x) — ^ for all paths x from sq to s' with s) G r. 

We describe below the algorithm for checking that an HCMSC N is realizable. 
We define for each state s and each type t the Gdeficit dt{s) of s as the difference 
between unmatched sends of type t and unmatched receives of type t in s. We 
can view TV as a weighted directed graph Gt{N) = {S, r, 7), with edges weighted 
by x{s',s) = dt{s'). That is, each edge is labeled by the Tdeficit of its source 
node. Then all we have to do is the following: 

1. Check that Gt{N) has no cycle with negative weight. 

2. Check for all states s, s' such that (5^ s) G r: the minimal weight d of a path 
from Sq to s' satisfies d > x, where x is the number of unmatched receives 
of type t in s, 

3. Check for all states s,s' such that ( 5 ^, 5 ) G r and s' contains a matched 
send of type t: the maximal weight d of a path from to s' satisfies d < x, 
where x is the number of unmatched receives of type t in s. 

For the first two items above we can apply a dynamic programming algorithm 
(WarshalEs algorithm) for computing the shortest paths between all pairs of 
nodes in time 0(|5p). That is, assuming that S = {si, . . . , we compute the 
minimal weight of paths from state Si to state sj by allowing as intermediate 
nodes 0, then {si}, {si, ^ 2 } up to S. Alternatively, we can use the Bellman-Ford 
algorithm, [4]. This algorithm computes in time 0(|5||r|) all shortest paths from 
a given source in a graph G with negative weights, provided that G contains no 
negative cycle (detecting such a cycle, if one exists). Combining the second and 
the third item above we need to check for all states s containing a matched 
send of type t and all nodes s' where [s' , s) G G that all paths from sq to s' have 
the same t-deficit, say D{s'). This means that we first compute the t-deficits 
along one path x from sq to s. Let D{s) = Dt{x)- Then we compute backwards, 
from states on, the deficits D[s') for all nodes s' belonging to paths from sq 
to 5 . It remains to check for each pair s' ^ s" of nodes between sq and 5 with 
( 5 ^, s") G T that we have D{s') -\-dt{s") = D{s"). The last step can be done edge 
by edge. The overall complexity is in 0(|r|). Doing all this for all graphs Gt{N) 
yields an O(l'PpltSHrl) algorithm for checking whether TV is realizable. 

We conclude this section with a remark on the regularity of the set of exe- 
cutions of an HCMSC. Note that a realizable HCMSC TV has bounded message 
queues if and only if Dt{x) — 0 for every loop x in ^ nnd every type t. It is 
not difficult to see that bounded message queues do not ensure that the set of 
linearizations of executions in an HMSC or an HCMSC is regular. In the case 
of HMSCs a syntactic restriction which is sufficient for regularity has been pro- 
posed in [3,11]. This condition states that the communication graph of every 
loop in the HMSC must be strongly connected. The communication graph of an 
MSC TH is a directed graph with vertex set consisting of all processes which 
occur in M . There is an edge from process P to process Q if P sends a message 
to Q in TH. The communication graph of a path tt in an HMSC is the communi- 
cation graph of the MSC associated with tt. We show in the following a similar 
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syntactic condition for HCMSC which is sufficient for obtaining a regular set 
of linearizations, provided that the message queues are bounded. For this we 
define the communication graph of an CMSC M as follows. As before, vertices 
are those processes with events occurring in M. We have an edge from P to Q 
if there is a (matched or unmatched) send event on P with target process Q. As 
for HMSCs we require that the communication graph of any loop in the HCMSC 
is strongly connected. 

Proposition 1. Let N be an HCMSC with bounded message queues, t,e,, the 
deficit of every execution x of N is such that Dt{x) C some constant k 

depending on N and for any type t. Assume that the communication graph of any 
loop in N IS strongly connected. Then the set of linearizations of N is regular. 

The proposition above can be shown using the same ideas as for HMSCs (see 
[ 3 , 11 ]). We can show that for any linearization of an execution c(5q)c(5i) • • • 0(5^7^) 
of N it suffices to store a polynomial number of prefixes of CMSCs c(5^). We use 
the fact that the deficit Dt (x) of any path x is at most equal to the size of the 
HCMSC N. 

7 An HCMSC Representation for Finite State Systems 

The HCMSC extension suggested in this paper broadens the scope of HMSCs 
and allows us to capture many more protocols. We present now an automatic 
translation from finite state systems with asynchronous message passing to (re- 
alizable) HCMSC. 

We are given a finite state space G = {S, So, E, H), with states S, initial 
states S^o C S, and edges E C S x E x S, labeled over a set of actions E. 
The actions in E are send, receive and local actions. The states in S contain 
information about the system, including the contents of the various interprocess 
message queues. 

We start with a trivial translation, which establishes the theoretic possi- 
bility of performing such a translation for a class of finite state systems with 
asynchronous message passing. We later proceed to suggest a more informative 
translation. The trivial translation is performed by constructing the dual graph 
H = (TV, TVo, E) of G as follows: 

— The nodes TV of TV correspond to the edges of G. That is, N — E . The label 
of a node e is the label of e in G. 

— The initial nodes TVq C TV of TV correspond to the edges of G that exit from 
an initial state of S'o- 

— The edges T" of TV correspond to pairs of edges of G such that the target of 
the first edge is the source of the second. 

The above trivial construction does not provide any new insight, since the 
HCMSC graph follows closely the state space and each CMSC node includes 
a single local or unmatched event. We thus look into a translation that would 
construct more reasonable HCMSCs. The translation aims at optimizing the 
following goals: 
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1. Minimize the number of unmatched events appearing in the individual CMSC 
nodes, if possible obtaining an HCMSC without any unmatched events (how- 
ever, recall from Section 3 that this is not always achievable). 

2. Present relatively long scenarios with the CMSCs, in order to obtain an 
intuitive understanding of the interprocess interaction. 

3. Minimize the number of individual CMSC blocks, so that the HCMSC would 
not become too big. 

Notice that the second and third goal may contradict each other in some systems. 
The above AriviaP translation gives a rather reasonable solution to the third 
goal, while providing unacceptable solution for the second goal. Notice further 
that the size of an HCMSC graph can easily get prohibitively large. Thus, in 
practice, the HCMSC construction algorithm should be applied only to small 
parts of communication protocols, rather than to complete protocols. 

It is easy to see that different execution paths in the state space may cor- 
respond to a single CMSC. For example, consider an execution path in which 
we have a send from PI to P2, then the matching receive^ then another send of 
the same type, and finally another matching receive. Consider now another ex- 
ecution path, in which we have first the two send transitions, and then the two 
receive transitions. These two paths obviously correspond to the same MSC. 
The partial order reduction algorithms were constructed for this particular rea- 
son. The sleep set method of Godefroid, adapted to our case, is in particular 
appropriate. 



The Algorithm 

Definition 7. For a letter e E F (an event), define the set of events dep{e) that 
include exactly events f such that either e and f are from the same process, or 
e and f are a matching pair. 

Notice that this definition is tailored for a message passing communication sys- 
tem and need to be adapted when using other kinds of concurrency (e.g., with 
shared variables). 

Let ^ a’ be a total order over the events in F satisfying that all the receive events 
precede the send events. Denote by en(^) the set of transitions that are enabled 
at a state s. 

1. Make a first guess of a set of nodes such that every cycle must pass through 
one of these nodes. One possibility is to set Z E S to include every node in 
which all the queues are empty. Another possibility is to start with the single 
set that includes the initial node. One heuristics is to perform simple DFS 
on the state space and include in Z every node in the target of a back edge. 
Notice that this is not optimal (finding a minimal set of such nodes is an 
NP-complete problem). The nodes in Z are new outpoints for the finite state 
space in the sense that every cycle must pass at least one of these points. 
Thus, the paths from Z to Z contain no cycles. 
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2. Start a minimized DFS from nodes in Z or at an initial state. The search 
stops at nodes in Z (after progressing at least one step) or to a terminating 
node. The minimization algorithm, related to Godefroid’s sleep set algo- 
rithm [5], and to the variant of that algorithm presented in [14] is shown in 
Figure 6. This version allows removing nodes that have an empty number of 
successors under the reduction.^ 

3. Construct CMSCs for the paths from the nodes in Z according to the paths 
generated during the reduced DFS of the previous step. Since the number 
of paths can be enormous, one can split the reduced graph further, e.g., at 
points that have a relatively large number of incoming or outgoing edges. In 
this way, we generate shorter paths, but possibly more of them. The matching 
algorithm at the end of the section can be used to match corresponding send 
and receive events in the same CMSC. 

4. Connect the separate CMSCs in the following way: If one CMSC ends at 
some state s ^ Z and another CMSC starts with that state, make an edge 
from the former to the latter. 

function expand_node(s, sleep); 
local explored, working_set, new_sleep, hxed; 
explored := 0; 
fixed := false; 

if en(s) = 0 then return true h; 
working^set := en(s) \ sleep ; 
while working^set / 0 do 

a := biggest action in working _set according to 
working ^set\= working ^set\{a}; 
s' := a(s); 

new_sleep:= (sleep U explored ) \ dep(a); 
explored := explored U {o'}; 

if G ^ orelse s^ is terminal orelse exists jnode(s\ new^sleep) 
orelse expand jnode(s\ new_sleep) then 
fixed := true; 

create_edge((s, sleep), a, (s\ new_sleep)) h; 
h 

end while; 

if hxed then storejnodeJn-hash(s, sleep); 
return fixed; 
end expand jnode. 



Fig. 6. A reduced state space generation algorithm 



Properties of the Algorithm. Dehne the relation ^ — Z between strings 
over 47 by (j — y p a — v e f w and p — v f ew, where v, w are sequences of 

^ Another change from the original algorithm is that the new nodes are pairs of a 
state and a sleep set, and two states that are paired with different sleep sets are 
considered different nodes. 
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transitions and /, e are individual transitions and / ^ dep{e). Let be the 
transitive and reflexive closure of — y. 

The relation between strings over U is such that v Q w when 

— i; is smaller than w according to the alphabetical order based on 

— w — > w' , and i; is a prefix of w\ 

Lemma 1. Ifv □ w, then a CMSC with a linearization v is a prefix of a CMSC 
with a linearization w. 

Sketch of proof. We can show that the transitions of each process in v are a 
prefix of the transitions of the same process in w. | 

Lemma 2. If v □ w, v is not a prefix of w, and w is generated during the 
reduced DFS^ then v is not generated by the algorithm. 

Sketch of proof. Take the longest common prefix u of v and w {u can be 
empty). Let b be the hrst letter after u in w, and a the hrst letter after u in 
V, Then from the dehnition of the relation we have that a A 6, a ^ dep[b), 
and b appears in v after u, following some sequence of transitions F that are not 
included in dep{b). According to the algorithm, during the DFS, w 6 is reached 
before ua. When the search backtracks from u, it has b in its sleep set, since 
a ^ dep{b). If the search reaches uu\ then b is still in the sleep set, since b is 
independent of all the events in uh Because of this, uF b is not generated. | 

Lemma 3. If v is not generated during the search, then there is some w such 
that V \Z w, and w is generated. 

Sketch of proof. First, observe that ‘E’ is a reflexive and transitive relation. 
The proof is by an induction on the order ‘Ek Suppose that v is not generated. 
This is because v — uF aw for some sequences u, F and w, and a transition 
a, and a was in the sleep set paired with the state obtained after the reduced 
DFS has searched u. Furthermore, the transition a was taken after u, and is 
independent of the transitions in F and is bigger according to than the first 
letter in uh Thus, we have that v E uaF w. Then, by the induction hypothesis, 
either uaF w is expanded, or a string such that uaF w E is expanded. 
But by the transitivity of E, we have the result. ■ 

The Matching Algorithm. Consider an CMSC node M constructed by 
the above algorithm. By construction, each path from the initial node to M 
has the same /-deficit, for every type t (since the states of the original finite 
state machine refer to the contents of queues). Notice also that every loop in the 
HCMSC thus generated has zero /-deficit, for any type /. Suppose now that the 
/-deficit of paths from the initial MSC Mq to the predecessors of M is equal to 
d. Then we match the events in M as follows. 

1. Mark the first d receive events of type / in M as Unmatched’ (there may be 
fewer than d such messages). 

2. Of the remaining send and receive events of type /, pair the ith send with 
the ith receive. 

3. If there are send events of type / that are unpaired in the previous step, 
mark them Tnmatchedk 
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8 Conclusion and Implementation 

HMSCs are a useful and standard notation for describing executions of com- 
munication protocols. We showed that the requirement of pairing up send and 
receive events in each MSC node prohibits the representation of a simple finite 
state protocol. We presented an extension of the HMSC notation, which we 
call HCMSC. This notation circumvents this problem. With the extension, we 
presented an algorithm for automatically generating the HCMSC structure for 
finite state communication protocols. We have implemented this algorithm as an 
extension of the Pet system [6]. The implementation is written using 800 lines 
of SML/NJ code, and in addition exploits the C code of the MSC/POGA [2] 
system for generating the HCMSC visual structure. 

Acknowledgment. We would like to thank Mihalis Yannakakis, who suggested 
the counterexample in Figure 4, which is simpler than our original counterex- 
ample. 
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Abstract. The growing popularity of sequence charts, first of all Mes- 
sage Sequence Charts and UML Sequence Diagrams, for the description 
of communication behavior has evoked criticism regarding the semantics 
of the charts which led to extensions of these standardized visual for- 
malisms. One such extension are Live Sequence Charts which allow to 
distinguish mandatory and possible behavior in protocol specifications. 
In the original language definition for LSCs the semantics are only de- 
scribed informally, although a sketch for a possible formalization has 
been provided as well. In this paper we intend to fill in the semantic 
blanks of the original LSC definition. Following the sketched path we de- 
fine the semantics of an LSC by deriving a Timed Biichi Automata from 
it. We also consider qualitative and quantative timing aspects. We finally 
show how LSCs are integrated into a verification tool set for Statemate 
designs. 



1 Introduction 

In recent years the use of Embedded Control Units (ECUs) has become more and 
more widespread in industry, especially in automotive and avionics applications. 
Many of these ECUs have to satisfy safety critical requirements. Developing such 
systems requires non-trivial effort to ensure correctness of the design. Therefore 
many companies have come to realize the usefulness of (semi-)formal methods 
in the development process of safety critical ECUs. 

One example of a semi-formal specification technique are Message Sequence 
Charts (MSCs), a graphical formalism which is concerned with the communi- 
cation behavior of protocols. MSCs have been standardized by the ITU (Inter- 
national Telecommunications Union) in Recommendation Z.120 ([IT96b]). De- 
signed to capture protocol scenarios in the telecommunication area, MSCs can 
also be used to sketch scenarios of general interprocess communication. 

Notwithstanding the fact that there exists a formal semantics [IT96a], we 
consider MSCs only a semi-formal specification technique, because there are 

* Research in part supported by the German Research Council (DFG) within the USE 
project as part of the SPP Integration of Specification Techniques with Engineering 
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still a lot of questions unanswered. For example, one MSC only specifies one 
scenario, i.e. one possible communication sequence of the system. But: What 
does a collection of MSCs for some system mean? Is progress along the instance 
axis enforced? What happens when a condition is evaluated to false? These and 
other open questions have been identified by several researchers (see e.g. [Krii99], 
[DH99]). 

We follow the Live Sequence Charts (LSC) approach of [DH99]^ which is 
an conservative extension of standard MSCs. As explained in [DH99] LSCs can 
be used to distinguish between accepted and non-accepted sequences of variable 
valuations of systems (runs). LSCs extend the formalism of MSCs along the 
following lines: 

— conditions are interpreted, they are not only treated as comments as in MSCs 

— distinction between possible (standard MSC) and mandatory behavior. This 
includes the ability to 

• enforce progress along each instance axis, 

• specify if a message has to be received or not, 

• distinguish between LSCs which show only one possible communication 
(existential interpretation) and LSCs which show mandatory communi- 
cation, i.e. it has to be exhibited by all system runs (universal interpre- 
tation), 

• distinguish between conditions that have to be satisfied and those that 
may be violated without generating an error. 

— specification of activation conditions guarding the activation point of the 
LSC, and whether an LSC should be activated only at system start {ini- 
tial activation mode) or whenever the activation condition evaluates to true 
{invariant activation mode) 

The parts of an LSC which may be interpreted either as mandatory or possi- 
ble are assigned a temperature, hot for mandatory and cold for possible elements. 
Thus we have messages, conditions and instance locations with temperatures. 
Graphically cold elements are depicted by dashed lines whereas hot ones are 
represented as solid lines (see figure 1). 

We will substantiate the original paper [DH99] in two ways: We will on one 
hand provide a more concrete semantics for a subset of the original features using 
Timed Biichi Automata. On the other hand we will introduce timing annotations 
which are not covered by [DH99]. Our notion of time is discrete as we base our 
time model on the steps of a system exhibiting time-discrete behavior. 

Before we go into the details of the process of transforming the LSC into 
an automata format (which we call unwinding) we need to explain the context 
in which we want to use LSCs. At the University of Oldenburg/OFFIS^ the 
Statemate Verification Environment (STVE) has been developed over the last 
years which allows to verify safety-critical properties of Statemate designs (see 
[BBea99], [DDK99] or [DKOl] for details). The Statemate tool from i-Logix 

^ A newer version of this paper is to appear this year: [DHOl] 

^ Oldenburger Forschungs- und Entwicklungsinstitut fiir Informatik-Werkzeuge und 
-Systeme 
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is a commercial case tool which is based on Harel’s statecharts [HP96] and is 
widely used in industry. The STVE translates the Statemate design into the 
input format of the underlying symbolic modelchecker and also lets the designer 
specify the properties to be verified in a graphical way. For this purpose Symbolic 
Timing Diagrams (STDs) are used, which have been developed at OFFIS as well. 
STDs allow to state constraints on the changes of values of inputs and outputs 
of the system under design; see [Fey96], [FJ97] for more information. The STD 
specification is translated into propositional temporal logic (see [Fey96]) which 
serves as input for the modelchecker. 

The algorithm of unwinding LSCs explained in this paper is already imple- 
mented in a tool (cf. section 5), which is integrated into the STVE at the moment 
of writing. The roles of LSCs and STDs are complementary, STDs talk about 
one component {black box view) whereas LSCs are obviously much better suited 
to express properties about the interactions of components {white box view). 

The paper is structured as follows: In section 2 we define the subset of LSC 
features considered here. Section 3 describes how the unwinding structure for 
an LSC is constructed and section 4 shows the subsequent transformation into 
a Timed Biichi Automaton. In section 5 we give an overview about the tool 
environment for verification of Statemate designs against LSC specifications. 
We give a summary and identify directions of further research in section 6. 



2 LSC Subset 

LSCs as presented in [DH99] provide a rich set of features. We will only treat 
a subset of these in the present paper in order to focus on the core concepts. 
The integration of LSCs into the STVE entails some other restrictions which are 
caused by Statemate. But we also add two features which were not covered 
in the original paper: timing annotations which allow to specify a lower and an 
upper bound between two subsequent locations on one instance. We borrow the 
interval notation used in both STDs [Fey96] and MSCs [AHP96]. Timer dura- 
tions are consequently interpreted as a multiple of our discrete base time unit. 
Our interpretation of timing annotations and timers allows the user to specify 
runs of a system not only in a qualitative manner, but also to constrain event se- 
quences by quantitative time bounds. The second new concept are simultaneous 
regions^ which allow to specify the simultaneous observation of several events. In 
contrast, the non-deterministic symbolic transition system presented in [DH99], 
implements a pure interleaving interpretation of LSCs: only a single instance is 
allowed to proceed at a time. We feel that such an interpretation is not powerful 
enough in the context of Statemate, where the communicating activities run in 
parallel and can change arbitrary many variable values at the same time. Besides 
explicit simultaneous regions, our interpretation considers unordered events of a 
coregion or of different instances observable in any order including simultaneity. 

In this paper we do not treat existential LSCs, because the universal inter- 
pretation seems to be the natural choice for formal verification as we want to 
prove that the entire system fulfills the specification. We feel that the intention 
of using an existential LSC is to get a satisfying run as a witness. This would 
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entail a modified unwinding algorithm which is out of scope for the present pa- 
per. Our algorithm can also handle sub-charts, the details of which we omit here 
due to limited space. 

The following concepts are contained in our approach (see figure 1 for the 
graphical representation of the concepts): Hot and cold locations, hot and cold 
messages, hot and cold conditions, coregions, simultaneous regions^ timer, timing 
annotations. 




Fig. 1. LSC example 

We restrict the setting of a timer to be bound (via a simultaneous region) 
to some sort of event. This is what we feel is the intention which was so far not 
expressible in MSCs: A timer is set when some event is observed and we then 
wait for some subsequent event. 

3 Constructing the Unwinding Structure 

In order to generate the unwinding structure from an LSC we first need to 
identify its building blocks. They are those elements of the chart which have 
to happen simultaneously, i.e which are indivisible. In our case the simultane- 
ous region construct covers the majority of these elements since it encompasses 
both regular messages and regular conditions. The other two elements left, the 
instance head and the instance end, are of a more auxiliary nature. They are 
not unwound explicitly but the set of all instance heads/instance ends forms 
the start state/end state for the unwinding structure. The other elements of our 
LSC subset are either irrelevant for the structure or can be stated using the 
simultaneous region. Timer and timing annotations are not treated in the un- 
winding, as their timing information is considered later when transforming the 
unwinding structure into an automaton. Actions are disregarded because they 
hold no information relevant for the run. Coregions are not treated as constructs 
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of their own but as the separate simultaneous region constructs of which they 
are comprised. Neither do sub-charts form a separate construct as they can be 
expressed by their enclosed simultaneous region constructs as well. 

The construction of the unwinding structure borrows the basic technique 
from the unwinding of Symbolic Timing Diagrams (cf. [Fey96]) using Phases. 
Informally a Phase shows how far the unwinding of the LSC has progressed. 
Starting with the Initial Phase all possible successor Phases are computed and 
this is iterated until the Final Phase is reached. 

For the formal definition of the unwinding procedure we first introduce the 
concept of a position in an LSC: A position is simply a graphical point in an 
LSC. The atomic building blocks of an LSC are events which may be one of 
the following: (1) instance head , (2) instance end, (3) sending a message, (4) 
receiving a message, (5) the valuation of a condition, (6) setting a timer, (7) 
expiration of a timer or (8) the reset of a timer 

In order to formally define events we introduce a number of sets. The 
first group of sets contains elements which are related to sets of positions, 
whereas the second group contains elements which are related to single positions. 



Sets of objects of LSC I 

related to sets of positions related to single positions 



Instances (1) set of instances Msgsend(l) set of message sendings 

Messages(l) set of messages Msgrecv(l) set of message receipts 

Conditions {1) set of conditions Set -Timer (1) set of timer settings 

Reset TTimer(l) set of timer resets 
Timeouts (1) set of timeouts 

Timer(l) SetTTimer{l) U Reset IT imer{l) 

U Timeouts(l) 

Analogously to the chart-oriented sets of the second group we define the same 
sets for each instance i of the LSC /. For conditions we define the set of condition 
valuations which are local to instance i as the restriction of the shared condition 
for the whole chart to instance i\ Conds(i) := Conditions(l) | i. We write 
Conds{l) for UiG/nstances(0 Conds{i). We denote the instance head of instance 
i by Ti and the instance end of z by T^. 



Events Given these basic definitions we now formalize events: 



For LSC I : 

Events(l) := 

{ I i G Instances{l) } U 
Msgsend{l) U Msgrecv{l) U 
Timer{l) U Conds{l) U 
{ I i G Instances{l) } 



For instance i : 

Events(i) := 

{T^} U 

Msgsend{i) UMsgrecv{i)U 
Timer(i) U Conds(i) U 
{Ti} 



With each event e G Event s(i) we associate its position given by the function 
position(e). In order to handle simultaneous observation of multiple events we 
introduce for instance i the maximal set: 

Sim-Regions{i) := {sr C Events{i)\ Vx,y G sr : position(x) = position{y)} . 
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We also associate a position with each sr G Sim -Regions (i) by the func- 
tion position(sr). Note that simultaneous regions are sets of basic events. Single 
events are treated as singleton simultaneous regions. Positions along one instance 
axis are totally ordered. 

We now consider the set Coregions{i) of coregions of instance i. A coregion 
cr G Coregions(i) starts at the graphical position x of instance i and ends at 
graphical position y. We define for cr : startpos(cr) := x and endpos(cr) := y 
and position{cr) := startpos{cr). For each cr G Coregions{i) we then de- 
fine contents{cr) := {sr G Sim -Regions (i) \ position{cr) < position{sr) < 
endpos{cr){ . 

In addition to a graphical position we associate a logical position with each 
simultaneous region which is used to determine the order along the instance 
axis. We call this logical position the location of the simultaneous region. For 
sr G Sim -Regions (i) we define : 



location{sr) 



( position(sr) ^ if ^3 cr G Coregions{i) : sr G contents{cr) 
{position{cr)^ if 3 cr e Corcgions{i) : sr G contents{cr) 



Let Locations(i) := { location{sr) \ sr G Sim-Regions{i)} be the set of locations 
of instance i. Note that for distinct sr, sr' G Sim -Region s(i) not located in the 
same coregion either location(sr) < location{sr') or location(sr') < location(sr) 
holds. Moreover, for distinct sr, sr' G Sim -Regions {%) located in the same core- 
gion: location{sr) = location{sr'). Thus, with respect to coregions locations 
along one instance axis are ordered only partially. Concerning simultaneous re- 
gions and coregions we formulate the following well-formedness rules: 



— Simultaneous regions located in a coregion must be singleton sets, or in other 
words, must be single events, because otherwise they would impose an order 
(simultaneity) on some of the events in the coregion. 

— At most one condition valuation may be located in a simultaneous region. 
Several condition valuations can always be merged into one. 

— Timer settings must only occur in simultaneous regions together with at 
least one non-timer setting event, because there has to be some reference 
point for the timer. 

For the unwinding procedure we furthermore need to know the predecessors 
of each simultaneous region which are determined by the following function 
{sr G S im -Region s{i))^\ 



,if sr = {_Li} 



predecessor(sr) := < 



{sr'\ sr' G Sim -Regions {%) A 

location{sr') < location{sr)A 
~^3f G Sim -Regions {%) : 
location{sr') < location{f) 

< location{sr)} 



else 



^ Note that the set of predecessors usually contains just one element. Only a coregion 
as predecessor produces a set containing several elements. 
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This definition of a predecessor set for each location allows us to determine sets 
of legal event sequences along one instance axis. But simultaneous regions of 
one instance can be bound to other simultaneous regions on other instances. 
For example, shared conditions involve a simultaneous region on each instance 
sharing the condition. If one of the simultaneous regions of this set occurs, the 
other simultaneous regions of the set must occur simultaneously. This leads to 
the definition of 



Simultaneous Classes Let ^lsc denote the equivalence relation “/las to hap- 
pen simultaneously'’'’^ then: 

SimJJlasses(l) := 

{sd C UiG/nstances(i) Sim.Regions{i)\ Vsr,sr' e scl ; sr ^lsc sr'} 
The simultaneous classes impose the ordering between different instances. Here 
the constructs that satisfy ^lsc are 

— shared condition valuations c G Conditions (1) which form a synchronization 
barrier, since c has to be evaluated simultaneously at each involved instance 

— sending and receiving a message m G Messages(l). This is only legal for 
models with zero delay communication like Statemate. For delayed (syn- 
chronous or asynchronous) communications sending and receipt of a mes- 
sage have to be treated separately. For simplicity’s sake we only consider 
non-delayed communication in the remainder of this paper. 

— a singleton simultaneous region 

Note that Vsr G UiG/nstances(0 Sim-Regions{i)3^ scl G SimJClasses{l) : 
sr G scl. For scl G SimJClasses{l) we now define the set of simultaneous classes 
which have to be unwound before scl. We call this set the prerequisites for scl. 

r 0 ,if Sde {p{\JieInstances(l){^i})} 

prerequisite{scl) := < {scV \ scl' G SimJClasses{l) A 3sr G scl 3 sr' G scl' : 

[ sr' ^ predecessor {sr) },else 

With these definitions we are now fully equipped to formally define the unwind- 
ing procedure. The procedure unwinds a LSC step by step by constructing sets 
of simultaneous elasses. 



Unwinding Sets Each step in the unwinding process is characterized by three 
sets: 

— History C SimJClasses{l)^ the set of simultaneous classes which have al- 
ready been unwound 

— Ready C SimJJlasses{l)^ the set of simultaneous classes whose prerequisites 
have already been unwound: 

— Fired G {Ready) ^ the set of simultaneous classes which are unwound in 
the current phase^ 

^ denotes the power set without the empty set. 
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In addition to these three sets we introduce the Cut through the LSC which 
keeps track of the progress of the unwinding, i.e. the Cut identifies the border 
line between elements which have already been unwound and those that still 
have to be considered. A cut can be visualized as a piece of rope lying across the 
LSC touching exactly one location of each instance. More formally we define the 
set of all cuts of an LSC I as: 

Cuts(l) \= {(xi, ..,x^) I Xj G S im .Region s{j)^ 1 < J ^ = \In stance s(l)\ } 

An unwinding phase Phasci consists of the sets Ready History i and the 
vector Cuti. Each phase is represented as a node in the unwinding structure 
which is therefore annotated with the the triple {Ready i^Historyi^ Cuti)] the 
phases are connected by edges annotated with elements of Fired. Thus we have 
the following sets: 

— Phases{l) - set of all possible phases for LSC I 

— Fireds{l) - set of all possible fired-sets for LSC I 

— Cuts{l) - set of all possible cut-vectors for LSC I 



Computing the Phases Each unwinding step entails the computation of the 
successor phase (s) from the present one starting with the initial phase and ending 
with the final phase. The ready set of the initial phase contains all simultaneous 
classes, which have only instance heads as prerequisites, its history and cut 
contain all instance heads: 

Phaseo = {Readyo^ Historyo^Cuto) , where 

Readyo = {a G Stm.Classes{l)\ prerequisite{a) G {p'^{\Jiein 3 tance 3 {i){-^i})} 
Historyo = {Ue/„stances(i) 

Cuto = (Ti, .., T^), where n is the Number of instances of LSC I 

A non-initial and non-final phase Phascj is characterized by 

Phase j = {Readyj^ History Cut j) , where 
Ready j = {a G Sim.Classes{l)\ 

V6 G prerequisite{a) : b G History j A a ^ History j} 
Historyj C Sim.Classes{l) 

Cutj = (xi, ..,Xn), V/c G Instances{l) : Xk G Sim. Regions (k) 

The final phase Phase final is characterized by 



Phase final = {Ready finah Hi story final, Cut final), where 
Ready final = {[Jielnstances(l){{~^ ^}}} 

History final = Sim.Classes{l) \ {Ue/nstances(o{{Ti}}} 

An unwinding-step from PhasCi to Phascj is thus defined by the function 
Step : Phases{l) x {Sim.Classes{l)) Phases{l) 
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where Step{{Readyi^ Historyi^Cuti)^ Firedi) = 

{Ready History i U Firedi^ upd{Cuti^ Firedi)), with 

upd : Cuts{l) X Fireds{l) Cuts{l) and 

upd{Cuti^ Firedi) := where 



Xk 



x'j^3 z ^ Firedi 3 sc/ G z : G scl 

Xk else 



, A: = 1, n 



For each subsequent unwinding step first a subset of the ready set is selected. 
This subset represents the simultaneous classes which are unwound in the current 
step. All possible subsets except the empty set are considered to determine the 
set of next nodes in the unwinding structure. The source node is connected to 
all its successor nodes by an edge annotated with the set of simultaneous classes 
unwound in this step. The new ready set is then computed for all successor nodes 
in the following manner: First the simultaneous classes which have just been 
unwound have to be removed from the ready set, then all simultaneous classes 
whose prerequisites are now fulfilled are added. In the history the just unwound 
simultaneous classes are recorded, whereas for the Cut the simultaneous classes 
that have just been unwound are added and the ones just left are removed. The 
unwinding structure concludes with the final node. Notice that the resulting 
structure contains one path for each possible ordering — including simultaneity! 
— of unordered events. 





(a) example LSC (b) corresponding unwinding 

structure 



Fig. 2. Simple unwinding example 
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Figure 2 shows a simple unwinding example. We have omitted the node 
annotations to increase readability, m2 and m3 are located in a coregion - they 
may be observed in any order and in our interpretation also simultaneously^. 



Optimizing the Structure The unwinding structure for an LSC can become 
very broad if the LSC contains large coregions or many elements which may 
be executed in parallel. As a consequence it may contain identical substruc- 
tures in different branches. In order to streamline the structure these identical 
substructures should be merged. But this merge may only be performed if the 
substructures represent the same unwinding step, i.e. they must have the same 
history and ready sets. 



Temperatures Each location is annotated with a temperature indicating if 
progress is enforced along the instance axis. When the events associated with 
a location are observed, a hot temperature at the location requires the events 
located at the following location to occur eventually. A cold temperature allows 
the events located at the following location not to occur at all, but if they are 
observed they must be observed after the events of the previous location. 

Let temp{i^ x) {hot, cold} be the temperature of location x at instance i. We 
now define the temperature of a cut by 

{ /lot, if 3 i e Instances{l) 3 G Locations{i) : Xi G Cutj 
A temp{i^ location{xi)) = hot 
cold^ else 

The temperature of a phase is defined to be the temperature of its Cut. We can 
now define the set of cold phases of LSC /: 

ColdPhases(l) := {ph G Phases{l) | ph = {Readyph, Hi story ph, Cut ph) A 
tempiCutph) = cold}. By definition: t emp{P has e final) = cold. 

4 Prom the Unwinding Structure to a Timed Biichi 
Automaton 

The unwinding structure is an intermediary data structure which is incomplete 
since it cannot express time. This motivates the following transformation of the 
unwinding structure into a Timed Biichi Automaton (TBA) which serves as 
intermediate format to the STVE. There exists a translation of the TBA format 
into propositional temporal logic (for details see [Eey96]). 

Before we formally introduce Timed Biichi Automata we describe how tim- 
ing information is added, since this is the key procedure in transforming the 
unwinding structure into a TBA. 

^ For delayed communication we would have to put the send and receive events of the 
messages in separate simultaneous classes. 
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4.1 Adding Timing Information 

Hot locations can be annotated by timing annotations which are interval nota- 
tions of the form [n,m], where n,m are non- negative integers and n < m. The 
notation is similar to the one used for constraint intervals in STDs [Fey96] or 
for specifying delays in MSCs [AHP96]. The meaning of a timing annotation is: 
After having observed the events located at the annotated location at least n and 
at most m steps later the events located at the following location are observed. 

For each location of the LSC an integer clock is introduced. This clock is 
reset when the location is unwound, i.e. when an event located at this location is 
observed. A boolean expression constrains the clock value to be in the specified 
range when the next location along the instance axis is entered (i.e. when an 
event located at the following location is observed). The boolean expression is 
simply true if the location does not have a timing annotation. For treatment of 
timing annotations we therefore define: 

— For i G Instances{l) and x G Locations{i) let dk{i,x) be the unique clock 
identifier denoting the clock associated with the location x of instance i 

— The set of clock names is given by 

Clocks(^l^ . UiG/nstances(Z) Locations (i) 

— For i G Instances(l) and x G Locations(i) let t-ann(i^x) be the timing 
annotation for location x of instance i. Note that t-ann{i, x) = e if location x 
of instance i is not annotated with a timing annotation. Otherwise t-ann{i^ x) 
is of the form [n, m], with n < m, n, m integers. 

— Let dkjresets{Firedj) := { dk{i,x) | 3 ^ G Firedj 3 sd G 3 sr G 
sd : location(sr) = x A x G Locations (i)} , sd G SimJClasses(l)^ sr G 
Sim .Region s(i) be the set of names of clocks which are reset when Firedj 
is unwound. I.e. for each location reached with Firedj the corresponding 
clocks are reset. 

— Let Clk.Resets{l) := U/GFireds(0 dk.resets{f) be the set of all clocks which 
are reset in LSC /. 

— Let elk. conds {Firedj) := { t.ann{i,x) \ 3z G Firedj 3 sd ^ z 3 sr ^ 

sd : X e predecessor{loeation{sr))}^ sd G Sim.Classes{l), sr G 

Sim.Regions{i) be the set of timing annotations to be considered when 
unwinding Firedj. 

— Let Clk.Conds{l) := U/GFireds(0 clk.conds(f) be the set of all timing an- 
notations in LSC 1. 

Finally let us note that timers may be treated in a way quite similar to timing 
annotations. Figure 3 shows an example; note that we are dealing with a TEA 
instead of an unwinding structure here. The TEA definition as well as the exact 
treatment of clocks will be demonstrated in section 4.2. 



4.2 Timed Biichi Automaton Definition 

A Timed Eiichi Automaton ([Alu98],[AD92]) A is a tuple 
A = (r, S, So, C — nBA,F), where 
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(a) example LSC with timing (b) corresponding 

info TBA 



Fig. 3. Unwinding timing annotations and timers 



— U is the alphabet 

— S is the set of states 

— sq G S is the initial state 

— C is a set of clocks 

— — >tba- S X Pred x p~^(C) x p{Conds{C)) S is the transition function. 
Pred are predicates ranging over E. CondsiC) are predicates constraining 
clocks of C. A transition (s,p, c, cond) — >tba s' represents the change from 
state s to state s' for observation p. The clocks in c are reset when taking 
the transition and the transition can only be taken if the clock constraints 
in cond hold. 

— Finally F C S is the set of accepting states 

Informally the TBA for an LSC is derived from its unwinding structure by 

— renaming the nodes with a fresh set of phase names 

— changing the edge annotation to the conjunction of the elements of the cor- 
responding fired set 

— adding a dedicated Exit state for violated cold conditions 

— determining the set of accepting states according to the Biichi acceptance 
condition 

— adding self loops on each state and labeling them with the condition which 
has to hold while the TBA stays in the associated state^. 

A TBA Alsc for an LSC is a tuple Alsc = 5q, C, — >tba^ F), where 

— E := SimJJlasses{l) 



The self loops are needed because time only passes when a transition is taken. Oth- 
erwise time could not progress in a state. 
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— S := Phjnames{l) U {Exit}, where Phjnames{l) is a set of fresh identi- 
fiers and phname is the function that associates a name from Phjnames(l) 
with each phase of the unwinding structure, i.e. V p G Phases{l) G 
Phjnames{l) : p' = phname{p). 

— sq := phname{Phaseo) 

— C := Clocks{l) 

— — >tba'- S X Pred x {Clk_Resets{l)) x p{ClkjConds{l)) S, where 
the Pred is built from / G Fireds{l) by conjunction (and negation) of the 
elements of /. 

— F := { phname{cph) | cph G ColdPhases(l))} 

Up to here we did not mention how we handle activation mode and activa- 
tion condition of a LSC. The language definition [DH99] provides the activation 
modes initial and invariant. An initial LSC is activated at system initialization, 
while an invariant LSC is activated whenever its activation condition evaluates 
to true; note that the kernel automata are identical in both cases. Activation 
mode and condition must be regarded when generating the temporal logic for- 
mulae from the TBAs[Fey96]. Thus we need to preserve this information in the 
TBA format and extend the it with an activation predicate. 



4.3 Determinism in the TBA 

Adding the self loops for each state in the TBA raises the question of what 
annotation should be put on the self loop. This is closely related to the question 
of determinism of the TBA. There are three options of what the transition 
annotation should be: First, the annotation could be omitted altogether - this 
would be equivalent to a true annotation - resulting in a very non-deterministic 
TBA. The true annotation does not require the TBA to take a transition when 
the corresponding message has been observed. This non-deterministic behavior 
is obviously too weak, so we need a stronger interpretation. 




Fig. 4. Different annotation types for self loops 
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The interpretation corresponding to [DH99] , where each occurrence of a mes- 
sage has to be explicitly noted in the LSC and no other occurrence is allowed, 
we call strict. This is achieved by annotating each self loop with the conjunction 
of the negation of all messages appearing in the LSC (cf. figure 4(c)). This inter- 
pretation may be too strong in certain cases where we do not care if messages 
visible in the LSC occur anytime else, as long as the ordering imposed by the 
LSC is satisfied. This leads to the weak interpretation where the self loop anno- 
tation only contains the negation of the next message(s). This forces the TBA 
to react to the first occurrence of the message that is expected next, but does 
not restrain the occurrence of this message at other times (cf. figure 4(b)). 

These different degrees of determinism only concern messages. For conditions 
we always annotate the self loop with true (cf. figures 4(c) and 4(b)), because we 
do not know when to evaluate a condition. This problem is inherent in the LSCs 
where there is only the possibility to specify if a condition has to be reached - 
and therefore evaluated - at all. Even if all locations before a condition are hot 
this does not tell us anything about when exactly the condition is evaluated. 
The designer would have to use a timing annotation to force a condition to be 
evaluated at a certain time or within a certain time interval. 

5 Integration with STVE 

The further incorporation of LSCs into the STVE is currently under way. The 
transformation of LSCs into TBAs described above is only one issue when con- 
necting LSCs to the tool set. We have also developed an editor and a mapping 
tool for LSCs. Since LSCs should not only be used in the context of Statemate, 
the LSC identifiers for instances, messages and those used in conditions are only 
symbolic names, i.e. place holders for concrete identifiers of the model to be ver- 
ified. Based on the internal representation of Statemate designs in the STVE 
(cf. [BBea99]) the mapping tool allows the user to associate the activities of the 
Statemate design with the instances of a LSC. The interfaces of the selected 
activities are computed and also offered to the user for identification of messages 
and conditions with certain variables and their valuations. The mapping of LSC 
objects to design items yields boolean expressions as atomic propositions of the 
temporal logic formulae. 

Eigure 5 gives an overview over the LSC tools in the Statemate context. 
Eor a given Statemate design LSC requirements are created with the LSC 
editor. The requirements thus created are then translated by the LSC compiler 
which implements the unwinding procedure described in this paper into the 
intermediate TBA format from which the temporal logic formula is generated. 
Since only symbolic names are used in LSCs, the TL formula only contains 
propositions which regard these symbolic names. Therefore we need the LSC 
mapper to relate the Statemate identifiers to the LSC identifiers. The result is 
a table which gives for each proposition (which consists of symbolic names) the 
concrete model elements. This table together with the formula generated from 
the LSC form the input for the modelchecker (^ in figure 5). The STVE also 
translates the Statemate model into the input format for the modelchecker 
which then determines, if the model satisfies the requirement. 
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LSC-Editor 




Fig. 5. Integration of LSCs with STVE 



6 Conclusion 

We have shown in this paper how the rather high-level semantics for LSCs pre- 
sented in [DH99] can be elaborated. We only considered a subset of LSC features, 
which consists of what we feel are the LSC core concepts. We made some fur- 
ther restrictions either for simplicity’s sake or due to limitations imposed by 
Statemate (zero-delay messages). We then demonstrated how this subset may 
be efficiently transformed into an automaton with the focus on the technical pro- 
cedure of unwinding the LSC. Having arrived at the TEA format the gateway 
to the STVE is wide open. This format is also used for code generation from 
STDs in conjunction with a Statemate design and to synthesize state charts 
from STDs. These routes are possible for LSCs as well, although at the moment 
we have only used LSCs for formal verification as described in section 5. 

The verification of Statemate models against LSCs has at the moment of 
writing not been tested extensively, so that more experience is needed in this re- 
spect. Especially the issue of complexity needs further investigation. While STDs 
are used to specify properties of components in a black box view, the benefit of 
LSCs is the ability to specify protocols in a glass box view of the system. There- 
fore it is quite natural to consider the whole model at once, while STDs allow the 
user to scale down the verification task by system verification (cf. [BBea99]). To 
verify large models against LSC specifications we will need powerful abstraction 
techniques to reduce the state space for the verification. Complexity also has to 
be investigated on the requirement side, where the formula may become very 
large depending on the size of the LSC and the degree of parallelism it contains. 

In the future we plan to extend both LSCs and the verification tool set to 
also cope with UML models. In this respect not only the concept of synchronous 
and asynchronous communication has to be reflected in LSCs, but we will also 
need to develop strategies for verification in the object-oriented world. 
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Abstract. In formal verification, we verify that a system is correct with respect 
to a specification. Even when the system is proven to be correct, there is still a 
question of how complete the specification is, and whether it really covers all 
the behaviors of the system. In this paper we study coverage metrics for model 
checking. Coverage metrics are based on modifications we apply to the system in 
order to check which parts of it were actually relevant for the verification process 
to succeed. We introduce two principles that we believe should be part of any 
coverage metric for model checking: a distinction between state-based and logic- 
based coverage, and a distinction between the system and its environment. We 
suggest several coverage metrics that apply these principles, and we describe two 
algorithms for finding the uncovered parts of the system under these definitions. 
The first algorithm is a symbolic implementation of a naive algorithm that model 
checks many variants of the original system. The second algorithm improves the 
naive algorithm by exploiting overlaps in the variants. We also suggest a few 
helpful outputs to the user, once the uncovered parts are found. 



1 Introduction 

In model checking [CE81,QS81,LP85], we verify the correctness of a finite- state system 
with respect to a desired behavior by checking whether a labeled state-transition graph 
that models the system satisfies a specification of this behavior, expressed in terms of 
a temporal logic formula or a finite automaton. Beyond being fully -automatic, an addi- 
tional attraction of model-checking tools is their ability to accompany a negative answer 
to the correctness query by a counterexample to the satisfaction of the specification in 
the system. Thus, together with a negative answer, the model checker returns some er- 
roneous execution of the system. These counterexamples are very important and they 
can be essential in detecting subtle errors in complex designs [CGMZ95] . On the other 
hand, when the answer to the correctness query is positive, most model-checking tools 
terminate with no further information to the user. Since a positive answer means that the 
system is correct with respect to the specification, this at first seems like a reasonable 
policy. In the last few years, however, there has been growing awareness of the impor- 
tance of suspecting the system of containing an error also in the case model checking 
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succeeds. The main justification of such suspects are possible errors in the modeling of 
the system or of the behavior, and possible incompleteness in the specification. 

There are various ways to look for possible errors in the modeling of the system 
or the behavior. One direction is to detect vacuous satisfaction of the specification 
[BBER97,KV99], where cases like antecedent failure [BB94] make parts of the spec- 
ification irrelevant to its satisfaction. For example, the specification Lp — AG{req 
A F grant) is vacuously satisfied in a system in which req is always false. A similar 
direction is to check the validity of the specification. Clearly, a valid specification is 
satisfied trivially, and suggests some problem. A related approach is taken in the pro- 
cess of constraint validation in the verification tool FormalCheck [Kur98], where sanity 
checks for constraint validation include a search for enabling conditions that are never 
enabled, and a replacement of all or some of the constraints by false. FormalCheck also 
keeps track of variables and values of variables that were never used in the process of 
model checking. 

It is less clear how to check completeness of the specification. Indeed, specifications 
are written manually, and their completeness depends entirely on the competence of the 
person who writes them. The motivation for such a check is clear: an erroneous behavior 
of the system can escape the verification efforts if this behavior is not captured by the 
specification. In fact, it is likely that a behavior not captured by the specification also 
escapes the attention of the designer, who is often the one to provide the specification. 

This direction, of checking whether the specification describes the system exhaus- 
tively, has roots in simulation-based verification techniques, where coverage metrics 
are used to improve the quality of test vectors. For example, code coverage [CK93] 
measures the fraction of HDL statements executed during simulation, transition cov- 
erage [HYHD95,HMA95] measures the fraction of transitions executed, and tag cov- 
erage [DGK96] attributes variables with tags that are used to detect whether assign- 
ing a forbidden value to a variable leads to an erroneous behavior of the system. In 
[FDK98,FAD99], Fallah et al. compute the tag coverage achieved by simulation and 
generated simulation sequences that cover a given tagged variable. Of a similar nature is 
the tour- generation algorithm in [HYHD95], which generates test vectors that traverse 
all states of the system. Ho and Horowitz [HH96] define test coverage in terms of con- 
trol events. Each control event identifies an interesting subset of the control variables, 
and the test vectors have to cover all the events. They also define design coverage by 
means of the states and edges covered by the test vectors (see also [MAH98]). In order 
to circumvent the state-explosion problem in these methods, Bergmann and Horowitz 
develop the technique of projection directed state exploration, which allows to compute 
the above coverage metrics for small portions of the design [BH99] . Coverage metrics 
are helpful as an indicator whether the simulation process has been exhaustive. Still, 
simulation-based verification techniques lack of a uniform definition of coverage. 

Following the same considerations, analyzing coverage in model checking can dis- 
cover parts of the system that are not relevant for the verification process to succeed. 
Low coverage can point to several problems. One possibility is that the specification 
is not complete enough to fully describe all the possible behaviors of the system. In 
this case, the output of a coverage check is helpful in completing the specification. An- 
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other possibility is that the system contains redundancies. In this case, the output of the 
coverage check is helpful in simplifying the system. 

Two approaches for defining and developing algorithms for coverage metrics in 
temporal logic model checking are studied in the literature. The first approach, of Katz 
et al. [KGG99], is based on a comparison of the system with a tableau of the specifica- 
tion. Essentially, a tableau of a universal specification (p is sl system that satisfies p and 
subsumes all the behaviors allowed by p. By comparing a system with the tableau of p, 
Katz et al. are able to detect parts of the systems that are irrelevant to the satisfaction of 
the specification, to detect different behaviors of the system that are indistinguishable 
by the specification, and to detect behaviors that are allowed by the specification but not 
generated by the system. Such cases imply that the specification is incomplete or not 
sufficiently restrictive. The tableau used in [KGG99] is reduced: a state of the tableau 
is associated with subformulas that have be true in it, and it induces no obligations on 
the other, possibly propositional, subformulas. This leads to smaller and less restrictive 
tableaux. Still, we found the approach in [KGG99] too strict. Indeed, a system passes 
the criteria in [KGG99] iff it is bisimilar to the tableau of the specification, but we want 
specifications to be much more abstract than their implementations^. 

The second approach, of Hoskote et al. [HKHZ99], is to define coverage by exam- 
ining the effect of modifications in the system on the satisfaction of the specification. 
Given a system modeled by a Kripke structure K, a formula p satisfied in K, and a 
signal (atomic proposition) q, a state w of K is g'-covered by p if the Kripke structure 
obtained from K by flipping the value of in tc no longer satisfies p (the signal q cor- 
responds to a boolean variable that is true if w is labeled with q and is false otherwise, 
so when we say that we flip the value of q, we mean that we switch the value of this 
variable). Indeed, this indicates that the value of g' in u? is crucial for the satisfaction 
of p in K. The signal q is referred to as the observable signal. Let us denote by K^j^q 
the Kripke structure obtained from K by flipping the value of q in w, and denote by 
q-cover{K^ p) the set of states g'-covered by p in K. It is easy to see that q-cover{K^ p) 
can be computed by a naive algorithm that performs model checking of p in Kw,q for 
each state w of K. By [HKHZ99], a state is covered if it belongs to q-cover[K^ p) for 
some observable signal q. 

Hoskote et al. describe an algorithm for computing the set of states that are q- 
covered by a formula p in the logic acceptable ACTL. Acceptable ACTL is a restriction 
of the universal fragment ACTL of CTL in which no disjunctions are allowed and all 
the implications a ^ are such that a is propositional. The algorithm in [HKHZ99] 
is applied to p after an observability transformation. The restricted syntax of accept- 
able ACTL together with the observability transformation lead to a symbolic algorithm 
that, like CTL model-checking, requires linear time On the other hand, the set of 
states designated as g'-covered by the algorithm is not q-cover{K, p). It is claimed in 

^ The approach in [KGG99] also has some technical and computational drawbacks: the spec- 
ification considered is the (big) conjunction of all the properties the system should satisfy, 
the complexity of the algorithm is exponential in the specification (for p in ACTL), and it is 
restricted to universal safety specifications whose tableaux have no fairness constraints. 

^ The restricted syntax of acceptable ACTL and the observability transformation force Kw,q, 
for all states w, to satisfy p in exactly the same way K does. For example, if a path in K 




Coverage Metrics for Temporal Logic Model Checking 531 



[HKHZ99] that the set found by the algorithm meets better the intuition of coverage. 
One can argue whether this is indeed the case; we actually found several performances 
of the algorithm in [HKHZ99] quite counter-intuitive (for example, the algorithm is 
syntax-dependent, thus, equivalent formulas may induce different coverage sets; in par- 
ticular, the set of states g' -covered by the tautology q ^ q is the set of states that satisfy 
q, rather than the empty set, which meets our intuition of coverage). Anyway, this is not 
the point we want to make here — there are many possible ways of defining coverage 
sets, each way has its advantages, and there need not be a best way. The point we want 
to make in this paper is that there are two important principles that should be part of any 
coverage metric for temporal logic model checking: a distinction between state-based 
and logic-based approaches, and a distinction between the system and its environment. 
These principles, which we explain below, are not applied in [HKHZ99] and in other 
work on coverage hitherto. 

The first principle, namely a distinction between state-based and logic-based ap- 
proaches, is based on the observation that there are several ways to model a system, and 
the different ways should induce different references to the observability signal and its 
modification. Recall [HKHZ99]’s definition of coverage. Hoskote et al. model systems 
by Kripke structures and the observable signal q is one of the atomic propositions that 
label the states of the structure and encode the system’s variables. For every state w, the 
truth value of q is flipped in Kw,q • This approach is state based, as it modifies q in each 
of the states. When the system is modeled as a circuit and its state space is 2^, for the 
set V" of signals, transitions are given as relations between current and next values to the 
signals in V" [MP92] . Then, flipping the value of a signal in a state changes not only the 
“label” of the state but also the transitions to and from the state. So, in the state-based 
approach, we consider modifications that refer to a single state of the system and to 
the adjacent transitions. When the system is modeled as a circuit, we can also take the 
logic-based approach, where we do not flip the value of a signal in a particular state, 
but rather, fix the signal to 0, 1, or “don’t care” everywhere in the circuit, and check the 
effect of these fixes on the satisfaction of the specification. 

In order to explain the second principle, namely a distinction between a system and 
its environment, assume a definition of coverage in which a state is covered iff its re- 
moval violates the specification. Since universal properties are preserved under state 
removal, no state would be covered by a universal specification in such a definition. So, 
is this a silly definition? The definition makes perfect sense in the context of closed sys- 
tems. There, universal properties can be satisfied by the empty system, and if a designer 
wants the system to do something (more than just being correct), this something should 
be specified by an existential specification. On the other hand, in an open system, which 
interacts with its environment, the above definition makes sense only if we restrict it to 
states whose removal leaves the system responsive to all the behaviors of the environ- 
ment and does not deadlock the interaction between the system and its environment. 
Indeed, we cannot remove a state if the environment can force a visit to it. Likewise, 
it makes no sense to talk about g' -coverage for a signal q that corresponds to an input 
variable. Indeed, it is the environment that determines the value of g', we cannot flip its 

satisfies all f3hy fulfilling /3 in the present, this path is expected to satisfy /3 in the present also 

in Kw,q^ This restriction is what makes the algorithm so efficient. 
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value, and anyway we cannot talk about states being g' -covered or not: all the values of 
q should be around simply since the environment can force them all. Hence, in the def- 
inition of coverage metrics, in both the design and implementation levels, there should 
be a distinction between input and output variables, and coverage should be examined 
only with respect to elements the system has control on. 

The contribution of our paper is as follows. We introduce the above two principles 
in the definition of coverage, and we give several coverage metrics that apply them. Our 
definitions are similar to the one in [HKHZ99] in the sense that they consider the influ- 
ence of local modifications of the system on the satisfaction of the specification (in fact, 
[HKHZ99] can be viewed as a special case of our state-based approach, for a closed 
system, with C H O = 0, and with only output variables being observable). Hence, the 
naive algorithm, which finds the set of covered states (in the state-based approach) or 
signals (in the logic-based approach) by model checking each of the modified systems 
is applicable. We describe two alternatives to this naive algorithm. The first alternative 
is a symbolic approach to finding the uncovered parts of the system. The second alter- 
native is an algorithm that makes use of overlaps among the modified systems — since 
each modification involves a small change in the original system, there is a great deal 
of work that can be shared when we model check all the modified systems. Both algo- 
rithms work for full CTL, and the ideas in them can be adopted to various definitions 
of coverage. Once the set of covered states is found, we suggest a few helpful outputs 
to the user (more helpful than just the percentage of covered states). 

Due to lack of space, many details are omitted from this version. A full version of 
the paper can be found in the authors’ URLs. 

2 Coverage Metrics 

In this section we suggest several coverage metrics for temporal logic model check- 
ing. As describe in Section 1, we distinguish between a state-based and a logic-based 
approach to coverage, and we distinguish between a system and its environment. Our 
definitions are independent of the temporal logic being used. We assume the reader is 
familiar with temporal logic. In particular, the algorithms we are going to present are 
for the branching time logic CTL. Formulas of CTL are built from a set AP of atomic 
propositions using the boolean operators V and the temporal operators A (“next”) 
and U (“until”), and the path quantifiers E (“exists a path”) and A (“for all paths”). 
Every temporal operator must be immediately preceded by a path quantifier. The se- 
mantics of temporal logic formulas is defined with respect to Kripke structures. For a 
full definition of Kripke structures, and the syntax and semantics of CTL, see [Eme90] 
and full version of this paper. For a formula (p (and an agreed Kripke structure K), we 
denote by \\(p\\ the set of states that satisfy ip in K, and use cl{p) to denote the set of 
p's subformulas. A Kripke structure K satisfies a formula p, denoted K |= p iff p 
holds in the initial state of K. The problem of determining whether K satisfies p is the 
model- checking problem. 

We distinguish between two types of systems: closed and open [HP85]. A closed 
system is a system whose behavior is completely determined by the state of the system. 
An open system is a system that interacts with its environment and whose behavior 




Coverage Metrics for Temporal Logic Model Checking 533 



depends on external nondeterministic choices made by the environment [Hoa85]. In 
a Kripke structure, all the atomic propositions describe internal signals, thus Kripke 
structures model closed systems. We study here open systems, and we model them by 
sequential circuits. 

A sequential circuit {circuit, for short) is a tuple S = {I,O,C,0, p,S), where I is 
a set of input signals, O is a set of output signals, and C is a set of control signals that 
induce the state space Accordingly, 0 ^ 2^' is an initial state, p : 2^ x 2^ ^ 2^ 
is a deterministic transition function, and ^ : 2^ ^ 2^ is an output function. Possibly 

0 n C 7 ^ 0, in which case for all x G O Pi C and s E 2^, we have a? G s iff a? G (^(s). 
Thus, (^(s) agrees with s on signals in C. We partition the signals in O U C into three 
classes as follows. A signal G O \ C is a pure- output signal. A signal G C \ O is a 
pure-control signal. A signal G C Pi O is a visible-control signal. While pure output 
signals have no control on the transitions of the system, a specification of the system 
can refer only to the values of the pure-output or the visible-control signals. 

We define the semantics of CTL with respect to circuits by means of the Kripke 
structure they induce. A circuit S — {1 ,0 ,C ,0 , p,6) can be converted to a Kripke 
structure Ks — (/ U C U O, 2*" x 2^, (6^, 0), L), where for all 5 and in 2^ , and 

1 and in 2^, we have R{{s^ i), (/, i')) iff />(s, i) = s' . Also, L{{s^ i)) = (J(5) U i U 
5. Note that each state in has 2l^l successors, reflecting external nondeterminism 
induced by the environment of S. We assume that the interaction between the circuit 
and its environment is initiated by the circuit, hence the single initial state. The other 
possibility, where the interaction between the circuit and its environment is initiated by 
the environment, corresponds to a Kripke structure with a set 6^ x 2^ of initial states. Our 
definitions and algorithms assume a single initial state, yet they can be easily modified 
to handle multiple initial states. 

We now define what it means for a specification to cover a circuit. Let 5 be a circuit 
that satisfies a specification p. We want to check whether the specification describes S 
exhaustively. Intuitively, the uncovered part of S is the part that can be modified without 
falsifying in 5. Formally, we suggest several definitions of coverage, reflecting the 
various possible ways in which a part of S can be modified. 

We start with the state-based definition. Here, we check whether the satisfaction of 
p is sensitive to local changes in the values of output and control signals; i.e., changes 
in one state. 

For a circuit S = (/, O, (P, 6^, p, ^), a state ^ G 2*" , and a signal x E C, we define 
the x-twin of s, denoted twinj^{s), as the state s' obtained from 5 by dualizing the value 
of X. Thus, X E s' Hi X ^ s. Now, given S, s, and a signal G O U C, we define the 
dual circuit Ss^x — {fO, C\ 9, p, S) as follows. 



- If is a pure-output signal, then 0 — 0, p — p, and 8 is obtained from 8 by dualizing 
the value of x in s, thus x E if x ^ 8 {s). 

- If is a pure-control signal, then 8 = 8, and 9 and p are obtained by replacing all 
the occurrences of s in 6^ and in the range of p by twmx{s). Thus, if 9 = s, then 
9 = twmx{s) (otherwise, 9 = 9), and for all s' E 2 ^ and i E 2 ^, if p{s' ^ i) = s, 
then p{s' , i) = twmx{s) (otherwise, p{s' , i) = p{s' , /)). 
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- If is a visible-control signal, then we do both changes. Thus, S is obtained from 
S by dualizing the value of x in 5 , and 0 and p are obtained by replacing all the 
occurrences of s in 6^ and in the range of p by twmx{s). 

Intuitively, dualizing a control signal x in a state s in 5 means that all the transitions 
leading to s are now directed to its x-twin. In particular, the state s is no longer reachable 
in Ss^x (which is why we do not have to redefine p{s, /)). For a specification Lp such that 
5 1= a state s is ^r-covered by Lp if Ss^x does not satisfy p^. 

Note that it makes no sense to define coverage with respect to observable input 
signals. This is because an open system has no control on the values of the input signals, 
which just resolve the external nondeterminism of the system. In a closed system, the 
set of input signals is empty, and thus the system has the control on all its variables. 
Therefore in closed systems we can define coverage with respect to all signals. 

Our second approach to coverage, which we call logic-based coverage, does not 
refer to a particular state of 5, and it examines the possibility of fixing some control 
signals to 0 or 1. For a circuit S — {I,0,C\ 0, p, S) and a control signal x E C, the 
x-fixed-to-l circuit Sxp — {I ^O^C^O\p\ S) is obtained from S by replacing all the 
occurrences of in 6^ and in the range of /> by 1; i.e., 0' — 0 {x}, and for all s E 2^ 

and i E 2^, we have p'{s, i) = p{s, i) U {x}. Similarly, the x-fixed-to-0 circuit Sxp is 
defined by replacing all the occurrences of x in 6^ and in the range of p by 0. A control 
signal X is 1-covered if 5^ 1 does not satisfy p. Similarly, x is {)-covered if 5^ 0 does 
not satisfy p. ^ 

3 Algorithms for Computing Coverage 

In this section we describe algorithms for solving the following problems. Let be a 
specification in CTL. 

- Given a circuit S that satisfies p and an observable output or control signal x, return 
the set of states not -covered by pin S. 

- Given a circuit S that satisfies p, return the set of control signals that can be fixed 
to 0 or fixed to 1 . 

All these problems have a naive algorithm that model checks each of the corre- 
sponding dual circuits. For example, in order to find the set of states not -covered by 
^ in a circuit S = (/, O, G, 0, p, S), with the observable signal x being a pure-output 
signal, the naive algorithm executes the model-checking procedure | times, once for 
each dual circuit, where each dual circuit is obtained from S by dualizing the value 
of X in one state. A similar thing happens in the naive algorithm with the observable 
signal being a control signal, only that here the dual circuits also differ in some of 

^ The logic -based definition of coverage is closely related to the notion of observability don ’t 
care conditions as presented in [HS96]. There, there is a set C' E C of control signals and an 
assignment for the signals in C' , denoted by the set a E C' of signals that are assigned true 
in this assignment, such that the behavior of the output signals in all the states a U /3, for all 
f3 E C\C' , is, the same. Thus, whenever the system is in a state in which the control signals 
in C has value a, the value of the other control signals is “don’t care”. 
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their transitions. Finally, the naive algorithm for the logic-based coverage executes the 
model-checking procedure \C\ times, once for each control signal. 

We present two alternatives to the naive algorithm. The first is a symbolic algorithm 
that manipulates pairs of sets of states, and the second is an algorithm that makes use 
of overlaps among the various dual circuits. We are going to describe our coverage 
algorithms in terms of Kripke structures. For that, we first need to adjust the definitions 
in Section 2 to Kripke structures. For a Kripke structure K, an observable signal q, a 
set y C W of states, and sets and of transitions inWxW, the dual Kripke 
structure K{q, Y, Y+, Z~ ) is obtained from K by dualizing the value of q in states in 
y , adding to R transitions in Y+ , and removing from R transitions in Y “ . It is easy to 
see that all the dualizations of circuits mentioned in Section 2 can be described in terms 
of dualizations of the Kripke structures they induce. 

For simplicity, we describe in detail the algorithms for computing the set of re- 
covered states for a pure-output signal x and the case 7 = 0. The same ideas work 
for the more general cases. Some generalizations are very simple (e.g., releasing the 
requirement for I being empty) and some result in a significantly more complicated 
algorithm (e.g., when the modification involve control signals, where transitions are 
modified). For the detailed explanation of the required modifications in both algorithms 
presented below see the full version of this paper. For our special case, we define, for a 
specification p, a Kripke structure K that satisfies p, and an observable atom q, the set 
of states g' -covered by ip in K as the set of states w such that the dual Kripke structure 
Kuj,q = 77(g', 0, 0) does not satisfy ip. Thus, a state of K is not g'-covered if 

we can flip the value of q in it and still satisfy ip. This definition, which is studied in 
[HKHZ99], corresponds to the special case of the observable signal being a pure-output 
signal and I being empty. So, in the algorithms below, we are given a Kripke structure 
K = {AP, W, R,wq, L) that satisfies a CTL formula p> and an observable atom q, and 
we look for the set of states w such that K^j^q does not satisfy ip. 

The naive algorithm for computing the set of g' -covered states performs model 
checking of p> in Kw,q for each w ^ W. Since CTL model-checking complexity is 
linear in the Kripke structure and the specification, the naive algorithm requires time 
0{\K\ • \p>\ • |py|), which is quadratic in the size of structure"^. Our symbolic algo- 
rithm returns an OBDD of the g' -covered states. Our improved algorithm has an average 
running time of 0{\K\ • |^| • log |py|). 

3.1 Algorithm 1: A Symbolic Approach 

Consider a Kripke structure K = {AP, W, R,wq, L) and an atomic proposition q G 
AP. For a CTL formula (p, we define 

P{X = ’>’) '■ Kv,q, W 1 = ip}. 

Thus, P{p) C W X W contains exactly all pairs {w, v) such that w satisfies p in the 
structure where we dualize the value of in i;. The definition of P{p) may not appear 

^ The algorithm for computing the set of g-covered states in [HKHZ99] runs in time 0(|77| • 
|(/?|). As we discuss in Section 1, however, the algorithm calculates the set of g-covered states 
according to a different definition of coverage, which we found less intuitive, and it handles 
only a restricted subset of ACTL. 
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helpful, as we are interested in the set of states in which dualizing the value of q falsifies 
Lp. Nevertheless, the g'-covered set in K for Lp can be derived easily from P{p^) as it is 
the set {uj : (ujq, '^) ^ 

Our symbolic algorithm computes the OBDDs for all ^ G cl{(p). The algo- 
rithm works bottom-up, and is based on the symbolic CTL model-checking algorithm. 
We assume that we have already performed symbolic model checking of p> in K ; thus 
the sets | |V^| | have already been computed for all G cl{p^). Let denote the set of 
pairs I IV^I I X PL. In each step of the algorithm, it calculates T’(V^) for some x/j G cl{(p), 
based on the sets S{ip') for xfj' G d{ip), and on the sets P{ip") for G c/(V^) \ {V^}. 

The algorithm uses the procedures Pair EX and Pair AX that take an OBDD 
P{^p) of pairs of states as an argument, and output an OBDD of pairs as follows: 
PairEX{P{'t{j)) = {(uj, i;) : there exists a successor u of w such that (u,v) G T’(V^)}. 
The procedure PairAX{P{'t{j)) is defined dually. Thus, the procedures Pair EX and 
Pair AX work as in symbolic CTL model checking, only that here we compute sets 
of pairs instead of sets of singletons. The procedures apply the modalities to the first 
element of the pair, assuming the second is fixed. 

We can now describe the algorithm. Given a CTL formula 'ip, we define P{'ip) according 
to the structure of 'ip as follows (we describe here only the existential modalities. The 
universal ones are defined dually using Pair AX, and are described in the full version). 

- ^ = p for an atomic proposition p ^ q. Obviously, changing the value of q in some 

state does not affect the value of p in any state. Therefore, P{p) = : p G 

L(w;)}. 

- p) — q. Since the satisfaction of in tc depends only on the labeling of w, changing 

the value of q in some state v affects it iff i; = w. Therefore, P{q) — : w ^ 

v,q e L{w)} U {(uj, w) : q ^ L{w)}. 

- 'ip = 'ip I V 'ip2. A state w satisfies 'ip in the dual structure Ky^q iff it satisfies either 
pji or p)2 in Kv,q- Therefore, P^^) = Pi'ipi) U P{'ip2)- 

- p) — -1^1. A state w satisfies p) in the dual structure Ky^q iff it does not satisfy ^p) 
in ky^q. Therefore, P{Pj) = {W x W) \ P{pJi). 

- 'ip = EX'ipi. A state w satisfies 'ip in the dual structure Ky^q iff there exists a 
successor of w that satisfies 'ipi in Ky^q. Therefore, P{'ip) = PairEX{P{'ipi)). 

- pj — E'ipiU'ip2. A state w satisfies p) in the dual structure Ky^q iff w satisfies p)2 
in Ky^q, or w satisfies 'ipi in Ky^q and there exists a successor of w that satis- 
fies p) in Ky^q. Therefore, P{k) is computed using the least fixed-point expression 

V (-P(V’i) A PairEX{y)). 

The symbolic algorithm for CTL model-checking uses a linear number of OBDD 
variables. The algorithm we present here doubles the number of OBDD variables, as it 
works with sets of pairs of states instead of sets of states. By the nature of the algorithm, 
it performs model-checking for all globally, and thus the OBDDs it computes 
contain information about the satisfaction of the specification in all the states of all the 
dual Kripke structures, and not only in their initial states. 

The algorithm is described for the case that the Kripke structure K has one initial 
state, and can be easily extended to handle a set Wq of initial states. Indeed, given the 
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set P{(f) of the set computed by the algorithm, the g'-covered set in K for (p is the set 
of all states w such that there is an initial state wq G Wq such that (wq^w) ^ P{p)- 



3.2 Algorithm 2: Improving Average Complexity 

Consider a Kripke structure K = {AP, W, icq, L), a formula p , and an atomic propo- 
sition q. Recall that the naive CTL coverage algorithm, which performs model checking 
for all dual Kripke structures, has running time of 0 {\K\ • \p\ • \ W\). While for some 
dual Kripke structures model-checking may require less than 0 {\K\ • |(^|), the naive 
algorithm always performs \W\ iterations of model checking; thus, its average com- 
plexity cannot be substantially better than its worst-case complexity. This unfortunate 
situation arises even when model checking of two dual Kripke structures is practically 
the same, and even when some of the states of K obviously do not affect the satisfaction 
of p in K. In this section we present an algorithm that makes use of such overlaps and 
redundancies. The expectant running time of our algorithm is 0 {\K\ • |^| • log \ W\). 
Formally, we have the following: 

Theorem 1. The set q-cover{K^ p) can be computed in average running time ofO{\K\- 

• log \ W\), where the average is taken with respect to all possible assignments ofq 
in K. 

All possible assignments of q in K are all possible labelings of the structure K with the 
observable signal q, where the value of q is chosen in each state of K to be true or false 
with equal probability. 

Our algorithm is based on the fact that for each w, the dual Kripke structure K^j^q 
differs from K only slightly. Therefore, there should be a large amount of work that we 
can share when we model check all the dual structures. In order to explain the algorithm, 
we introduce the notion of incomplete model checking. Informally, incomplete model 
checking of K is model checking of K with its labeling function L partially defined. 
The solution to the incomplete model checking problem can rely only on the truth values 
of the atomic propositions in states for which the corresponding L is defined. Obviously, 
in the general case we are not guaranteed to solve the model-checking problem without 
knowing the values of all atoms in all states. We can, however, perform some work in 
this direction, which is not needed to be performed again when missing parts of L are 
revealed. 

Consider a partition of W into two equal sets, Wi and W2 . Our algorithm essentially 
works as follows. For all the dual Kripke structures Kw,q such that w E Wi, the states 
in W2 maintain their original labeling. Therefore, we start by performing incomplete 
model checking of p in K with L that does not rely on the values of q in states in Wi . 
We end up in one of the following two situations. It may be that the values of q in states 
in IV2 (and the values of all the other atomic propositions in all the states) are sufficient 
to imply the satisfaction of p in K. Then, we can infer that all the states in Wi are not q- 
covered. It may also be that the values of q in states in IV2 are not sufficient to imply the 
satisfaction of p in K. Then, we continue and partition the set Wi into two equal sets, 
Wii and W12, and perform incomplete model checking that does not rely on the values 
of q in states in Wu. The important observation is that incomplete model checking 
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is now performed in a Kripke structure to which we have already applied incomplete 
model checking in the previous iteration. Thus, we only have to propagate information 
that involves the values oi q in W 12 . Thus, as we go deeper in the recursion described 
above, we perform less work. The depth of the recursion is bounded by log | kF | . As we 
shall analyze exactly below, the work in depth i amounts in average to model checking 
of in a Kripke structure of size Hence the 0(|/T| • \(p\ • log \ W\) complexity. 

It is easier to understand and analyze incomplete model checking by means of the 
automata-theoretic approach to branching time model checking [KVWOO]. In this ap- 
proach, we transform the specification ip to an alternating automaton that accepts 
exactly all the models of p. Model checking of K with respect to p is then reduced to 
checking the nonemptiness of the product Ak^^p of K and A^. When p is in CTL, the 
automaton A^ is linear in the length of p, thus the product is of size 0{K • |(^|). 

The product Ak^^p can be viewed as a boolean circuit Gk,(p (unlike boolean circuits, 
the product Ak,p> uiay contain cycles. In the full version we show how to handle these 
cycles) . The root of Gk,(p is the vertex {win^ p). The leaves of Gk,p> are pairs {w^p) 
or (uj, -ip). The formula p is assumed to be in a positive normal form, thus negation 
applies only to the leaves of Gk,ip- The inner vertices of Gk,(p are labeled with true 
or false, and the leaves are labeled with literals (variables or their negations). Initially, 
each leaf (u?, p) or (u?, -ip) has a value, true or false, depending on the membership of 
p in L{w). The graph has at most 2 • \AP\ • \ W\ leaves. Intuitively, incomplete model 
checking corresponds to shrinking a boolean circuit part of whose inputs are known 
to an equivalent minimal circuit. The shrinking procedure (described in detail in the 
full version of this paper) replaces for example, an OR-gate one of whose successors is 
a leaf with value true, with a leaf with value true. In each iteration of the algorithm 
we assign values to half of the unassigned leaves of Gk,(p and leave the other half 
unassigned. Recall that our average complexity is with respect to all assignments of 
q in K. Therefore, though we work with a specific Kripke structure, and the values 
assigned to half of the leaves are these induced by the Kripke structure, the average 
case corresponds to one in which we randomly assign to each unassigned leaf the value 
true with probability the value false with probability and leave it unassigned with 
probability |. The complexity described in Theorem 1 then follows from the following 
result from circuit complexity. 

Theorem 2. Boolean circuits shrink by at least the factor ofe under a random assign- 
ment that leaves the fraction of e variables unassigned and assigns true or false to 
other variables with equal probability. 

Theorem 2 follows from the fact that boolean circuits for parity and co-parity, which are 
the most “shrink-resistant” circuits (shrinking a parity or co-parity circuit according to a 
random partial assignment to the variables results in a parity or a co-parity circuit for the 
remaining variables), are linear in the number of their variables (see [Weg87,Nig90]). 
Let m be the size of the graph Gk,<p- By Theorem 2, incomplete model checking in 
depth i of the algorithm handles graphs of size and there are 2* such graphs. Hence, 
the overall average running time is 

0{m) + 2 • 0{m/2) + 4 ■ 0(m/4) + . . . + ■ 0(1), 
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which is equal to 0{m • log \ W\). Since m is 0{\K\ • |^|), the 0(| A"| • |^| • log \ W\) 
complexity follows. 

Remark 1. In fact, the algorithm usually performs better. First, since the graph Gx^ip 
is induced by a top-down reasoning, it may not contain leaves for dX\ {w,q) G W x 
{q}. States w such that {w, q) does not exist in Gk.p are obviously not g'-covered, and 
the algorithm indeed ignores them. In addition, note that in the first iteration of the 
algorithm (that is, the first shrinking of Gx^ip)^ the unassigned leaves are exactly all the 
leaves of Gk,p of the form {w, q), and all the other leaves have values. While we cannot 
apply Theorem 2 directly and get shrinkage with e = for this case (this is since we 
defined the average performance of the algorithm with respect to random assignments 
to q only, and since Gk,p typically contains less than \ W\ • \AP\ leaves), we can still 
expect significant shrinkage in this call. 

The algorithm can be easily adjusted to handle multiple initial states, where we 
check whether the modification falsifies the specification in some initial state. A naive 
adjustment repeats the algorithm for each of the initial states. A better adjustment han- 
dles all the initial states together, say by adding a new single initial state with transitions 
to all the initial states, and replacing the specification ^ by the specification AX(p. 

4 Presentation of the Output 

Once we found the parts of the system that are not covered, there are several ways to 
make use of this information. For a circuit S and a signal x, let x -cover (5, (p) denote 
the set of states a?-covered by p in S. One can compute x-cover{S, p) for several output 
signals x ^ G U O (possibly, each of the properties that together compose the specifi- 
cation has a different set of signals that are of potential relevance to it). In [HKHZ99], 
coverage is defined as the ratio between the number of states that are members of ^r- 
cover (5, p) for at least one signal x and the number of states in the design. This ratio 
indeed gives a good estimation to the part of the system that has been relevant for the 
verification process to succeed. In their implementation, Hoskote et al. also generate 
computations that lead to uncovered states. 

We believe that once we worked hard and calculated x -cover {S, p) for signals x, 
there are more helpful things that can be done with these sets. First, before merging 
^r-covered sets for different signals, it is good to check whether x -cover (5, p) is empty 
for some of the x's in isolation. An empty set may indicate vacuity in the specification 
(see also Section 5). In addition, we can use the sets x-cover{S, p) in order to generate 
uncovered computations. Consider a state 5 of 5 that is not -covered. The fact that 
s is not -covered means that the specification fails to distinguish between S and the 
dual circuit Ss,x- Therefore, possible errors caused by an erroneous value of in s are 
not captured by the specification. So, a computation that contains the state s may be 
an interesting output. Even more interesting are computations that contain no covered 
states or many states that are not covered. Indeed, such computations correspond to 
behaviors of the circuit that are not referred to by the specification. Recall that the circuit 
models an open system. Thus, these computations are induced by input sequences that 
are ignored by the specification. 
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It is not hard to generate computations that consist of uncovered states only (if such 
computations exist) — we just have to restrict the transition function of S to uncovered 
states, which can also be done symbolically. In addition, computations that have many 
uncovered states can be found symbolically using the following greedy heuristic: we 
start from the initial state. Whenever we are in state s, we apply the post operator 
(given a set T of states, post{T) returns the set of successors of states in T) until we 
encounter a set of states that contains an uncovered state, from which we continue. 
Alternatively, we can start with maximal computations consisting of uncovered states 
only, and connect them with transitions via covered states. 

The above methodology can be applied also to the logic-based definition of cover- 
age. Obviously, if we discover that a control signal x can be fixed to 1 or to 0 without 
violating the specification, this means that x is useless in the circuit according to the 
specification. This should not happen in a correct implementation of a complete speci- 
fication. Therefore, a valuable output in this case is the list of uncovered control signals. 

Recall that model-checking tools accompany a negative answer to the correctness 
query by a counterexample to the satisfaction of the specification. When model check- 
ing succeeds, we suggest to accompany the positive reply with two computations: one 
is the interesting witness of [BBER97], which describes a behavior of the system that 
cares about the satisfaction of all the subformulas of the specification. The second is 
a non-interesting witness: a computation that is not covered by the specification. We 
believe that both outputs are of great importance to the user. While the first describes 
some nontrivial correct behavior of the system, the second describes a possibly incorrect 
behavior of the system that escaped the verification efforts. 

5 Discussion 

In this section we briefly discuss some more aspects of coverage metrics for temporal 
logic model checking. An extended discussion can be found in the full version. 

Definition of coverage metrics There are several interesting possible relaxations of 
the definitions given in Section 2. One of them is allowing nondetermini Stic circuits. 
In nondetermini Stic circuits we can examine more coverage criteria: check the circuit 
obtained by merging 5 and twinj^{s), check the circuit obtained by fixing a control 
signal to “don’t care” (i.e., replace a transition to a C C, with two transitions, to a U } 
and to a \ {^r}), etc. Another possibility is to allow different types of modifications in 
the system, for example flipping the value of q simultaneously in several states. Our 
definitions and algorithms can be easily modified to handle simultaneous modifications 
of the system. The corresponding algorithms, however, need to examine exponentially 
many modified systems and their complexity is much higher. Finally, we can think of 
(i as a function that maps the output signals to true, false, or don ’t care, thus allowing 
the designer to specify in advance that the values of certain signals are not important in 
some states. In this case we say that a system satisfies a specification if is satisfied 
no matter how q is assigned in states where its value is don’t care. Our definitions of 
coverage apply also to designs with incomplete 6 as above. Our algorithms can be easily 
adjusted to such designs. 
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Properties of coverage metrics The covered sets defined in Section 2 are sensitive to 
abstraction, in the sense that there is no correlation between the covered states in a sys- 
tem and its abstraction. From the other hand, the set of covered states is not sensitive to 
applying cone of influence reduction, where we abstract away parts of the systems that 
do not contain variables appearing in the specification or influence variables that appear 
in the specification [CGP99]. We also note that the notions of coverage and vacuity 
[BBER97,KV99] are closely related. Vacuity can be viewed as a coverage metric for 
the specification. Also, if there is a signal x that does not influence the satisfaction of p 
in the system, then no state in the system is x-covered by p. Our coverage metrics are 
compositional, in the sense that the intersection of the uncovered sets for the underlying 
conjuncts is equivalent to the uncovered set for their conjunction. Formally, if Si is the 
set of states not g'-covered by pi and S 2 is the set of states not g'-covered by p 2 , then 
Si n S 2 is the set of states not g'-covered by pi A (p 2 - On the other hand, the cover- 
age criteria defined in [KGG99], as well as the covered sets found by the algorithm in 
[HKHZ99] are not compositional. 

There are still several open issues in the adoption of coverage metrics to temporal 
logic model checking. For example, it is not clear whether and how a coverage metric 
that aims at checking incompleteness of the specification should be different from a 
metric that aims at finding redundancies in the system. Another issue is the feasibility 
of coverage algorithms. While the algorithms that we presented have better complexity 
that the naive algorithm, their complexity is still larger than model checking. This may 
prevent a wide use of coverage metrics for temporal logic in formal verification. Clearly, 
there is room for improving the current algorithms, as well as searching for the new 
ones both for CTL and for other temporal logics, in particular LTL logic. Finally, while 
it is clear that outputs as these described in Section 4 are very helpful to the user, it 
is not clear whether high coverage is something one should seek for. Indeed, there is 
a trade-off between complete and abstract specifications, and neither completeness nor 
abstraction has clear priority. 
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Abstract. In this paper, we describe the design and implementation 
of a parallel model checking algorithm for the alternation free fragment 
of the )U-calculus. It exploits a characterisation of the model checking 
problem for this fragment in terms of two-person games. Our algorithm 
determines the corresponding winner in parallel. It is designed to run 
on a network of workstations. An implementation within the verihcation 
tool Truth shows promising results. 



1 Introduction 

Model checking is becoming more and more popnlar for the verification of com- 
plex hardware and software systems. These systems are nsnally given by a formal 
description which can be transformed into a transition system. A desired prop- 
erty of the system, on the other hand, is nsnally specified as a formnla of a 
temporal logic. A model checking algorithm answers the qnestion whether the 
(transition) system satisfies this property. Nnmerons case stndies have shown 
that this approach improves the early detection of errors [3]. 

Despite the developments in the last years, the so-called state space explosion 
still limits its application. While partial order reduction [20] or symbolic model 
checking [17] rednce the state space by orders of magnitnde, typical verification 
tasks still last days on a single workstation or are even (practically) nndecidable 
dne to memory restrictions (see for example [7]). 

On the other hand, cheap yet powerfnl parallel compnters can be constrncted 
of Networks Of Workstations (TVOWs). From the ontside, a NOW appears as 
one single parallel compnter with high compnting power and, even more im- 
portant, hnge amonnts of memory. This enables parallel programs to ntilise the 
accnmnlated resonrces of a NOW to solve large problems. Hence, it is a fnn- 
damental goal to find parallel model checking algorithms which then may be 
combined with well-known techniqnes to avoid the state space explosion gaining 
even more speednp and fnrther redncing memory reqnirements. 

A famons logic for expressing specifications is KozenA //-calcnlns [11], a 
temporal logic offering boolean combination of formnlae and, especially, labelled 
ncTt-state, minimal, and maximal fixpoint qnantifiers. For practical applications, 
however, it snfhces to restrict the //-calcnlns in order to gain tractable model 
checking procednres. The alternation free fragment, denoted by prohibits 
the nesting of minimal and maximal fixpoint operators. It allows the formnlation 
of many safety as well as liveness properties. While this fragment is already 
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important on its own, it subsumes the logic CTL [6] which is employed in many 
practical verification tools. It can be shown that the model checking problem 
for this fragment is linear in the length of the formula as well as the size of the 
underlying transition system, and, starting with [4], several sequential model 
checking procedures are given in the literature (see [13] for a comparison). The 
algorithms can be classified into global and local algorithms. Global algorithms 
require the underlying transition system to be completely constructed while 
local algorithms compute the necessary part of a transition system on-the-fly. 
In plain words, global algorithms typically compute the fixpoints in an inductive 
manner while the local algorithms decide the problem by a depth-first-search. 
[13] compares the algorithms in detail. 

Before starting to think about a concrete algorithm, we should consider its 
limitations, i.e. its complexity. In complexity theory, it is a well-accepted view 
that problems within the class NC admit a promising parallel computing algo- 
rithm. NC is based on the Boolean circuit model for computation and describes 
the problems computable in polylogarithmic time with polynomially many pro- 
cessors. It can be shown that NC is contained in P. Problems outside of NC are 
consequently considered to be inherently sequential. However, it is not known 
whether NC=P. If not, then especially P-complete problems cannot be in NC. 
Hence, we call P-complete problems inherently sequential [8]. 

We show that model checking is inherently sequential, limiting our en- 
thusiasm for finding a (theoretically) good parallel model checking algorithm. 
Even worse, depth-first-search is also P-complete, hence, promising parallel lo- 
cal algorithms are unlikely to exist [21]. Despite these theoretical limitations, we 
present a parallel global model checking algorithm. We implemented it within 
our verification tool Truth [14] and found out that it behaves very well for 
many practical problems. 

Our algorithm is based on a characterisation of the model checking problem 
for this fragment in terms of two-person games due to Stirling [24]. Strictly 
speaking, we present a parallel algorithm for colouring so-called game graphs 
corresponding to the underlying model checking problem. This colouring answers 
the model checking problem and allows a derivation of a winning strategy. The 
latter may be employed by the user of a verification tool for debugging the 
underlying system interactively [24]. 

Another characterisation of this model checking problem can be given in 
terms of so-called l-simple-weak-alternating-Biichi automata [12]. However, 
these correspond to game-based model checking [16]. Hence, our algorithm can 
also be used for checking the emptiness of these automata in parallel. 

Until today, not much effort has been taken to consider parallel model check- 
ing algorithms. [25,1] present parallelised data structures which employ further 
computers within a network as a substitute for external storage. The algorithms 
described in [19,2] divide the underlying problem into several tasks. However, 
they are designed in the way that only a single computer can be employed to 
sequentially handle one task at a time. Stern and Dill [23] show how to carry 
out a parallel reachability analysis. The distribution of the underlying struc- 
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ture is similar to the one presented here. Bnt their algorithm is not snitable 
for model checking temporal logic formnlae. [9] presents a parallel reachability 
analysis algorithm for BDDs. They argne that many safety properties can be 
formnlated as a reachability problem. In this way, their algorithm allows check- 
ing safety formnlae. However, Uveness properties which can be expressed within 
are not snpported. Fnrthermore, [10] argnes that explicit state representa- 
tion as well as BDDs have application domains in which they ontperform the 
other one. Moreover, BDD-based algorithms generally do not provide connter 
examples, which are important in practice. Onr main contribntion is the first 
parallel model checking algorithm for that snpports interactive debngging. 

The syntax and semantics of the //-calcnlns are introdnced in the next sec- 
tion. Fnrthermore, it is shown that is inherently seqnential. In Section 3, 
we describe model checking games for the //-calcnlns and provide an important 
characterisation of the game graph which will be the basis for onr parallel algo- 
rithm. Section 4 discnsses onr parallel model checking procednre and is followed 
by experimental resnlts. We conclude by summing up our approach as well as 
giving directions for future work. 

2 The ^^-Calculus 

In this section, we recall the syntax and semantics of the (modal) //-calculus and 
its alternation free fragment. Furthermore, we show that model checking this 
fragment is inherently sequential. 



2.1 Syntax and Semantics 

Let Var be a set of variables and A a set of actions. Formulae of the modal 
//-calculus over Var and A in positive form as introduced by [11] are defined as 

Lp ::= false | true \ X \ Lpi /\ Lp 2 \ \ [K'\p> \ {K)(p \ iyX.(p \ pX.ip 

where X G Var and K ranges over subsets of actions A. Like [24], we allow sets 
of actions instead of single actions appearing in modalities.^ It is a simple exercise 
to extend our approach towards the propositional //-calculus (cf. Section 4). 

A formula ip is normal if every occurrence of a binder pX or nX in ip binds a 
distinct variable, and no free variable X in (p is also used in a binder pX or nX , 
Every formula can easily be converted into an equivalent normal formula. If a 
formula ip is normal, every bound variable X of (p identifies a unique subformula 
pX.fi or nX.fi of ip where X is a free variable of fi. 

Let T = {S, T, Al, So) be a labelled transition system where A is a finite set of 
states, A a set of actions, and T C S x A x S denotes the transitions. As usual, 
we write s t instead of (s,a,t) G T. Furthermore, let sq G A be the initial 
state of the transition system. We employ valuations E mapping a variable X 
to a set of states V {X) C A. Let E[X/X], X C A, be the valuation which is the 
same as E except for X where E(X) = E. Given a labelled transition system 



^ ( — )(/? is an abbreviation for {A)(p. 
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'Y — (S', T, Si, So) and a formula (p over Var and Si, the satisfaction of Lp wrt. T 
and a state s G S is defined for true, false, disjunction, and conjunction in the 
usual way, and for the temporal and fixpoint operators as follows: 
s |=v [K]p) iff Va G AS if s A t then T, t |=v (p 

A, s |=v {K)p iff 3a G AS s t and T, t |=v p 

T, s |=v fiX.p iff VA C S: if s ^ A then 3t G S : t ^ A and T, t \=v[x/E] Y 

T, s |=v lyX.p iff 3 A C S, s G A: and Mt ^ E \ T,t \^v[xjE] Y 

We shorten T, s |=v by A, s |= for a formula p without any free variables 
and A, So |=v by A |=v S’ identifiers like s A? • • • for formulae, s, t, . . . 

for states, and a,b, . . . for actions of the transition system under consideration. 
K denotes a set of actions. Whenever the sort of the fixpoint does not matter, 
we use a for either p ov e. For a formula of the //“Calculus, we introduce the 
notion of subformulae and free and bound variables as usual. 

The alternation free fragment of the //-calculus is that sublogic of the //- 
calculus where no subformula ip of a. formula p contains both a free variable X 
bound by a pX in p as well as a free variable Y bound by a eY in p. 

Given a labelled transition system A and a formula p, model checking is the 
problem whether A satisfies p, i.e. whether T p. The combined complexity of 
the model checking problem is its complexity wrt. the product of the size of the 
transition system and the size of the formula. Its program complexity considers 
the complexity only wrt. the size of the transition system. 

In [26], it was shown that the combined complexity for the alternation free 
//-calculus is P-complete and, for a version of the alternation free //-calculus 
employing two actions, that its program complexity is P-complete. [12] shows 
the latter result by using a formula with two propositions. We strengthen both 
results by employing neither propositions nor any action labelling. 

Lemma 1. The program complexity of the alternation free p-calculus is P-hard, 

Proof, We reduce the P-complete Game Problem [8] to checking a formula of 
the alternation free //-calculus wrt. a corresponding labelled transition system. 
A two player game is a tuple G — (Ai, A 2 , M, ITo, s). Pi and A 2 , Ai H A 2 = 0 , 
are positions, in which it is the turn of Player 1 or Player 2, resp. M C (Ai x 
A 2 ) U (A 2 X Ai) is the set of moves the respective player can make. Wo C Ai U A 2 
denotes the succeeding positions^ s ^ Pi the starting position. The players move 
alternately beginning with Player 1. We call a? G Ai U A 2 winning iff either x 
is succeeding {x G ITo), or x G Ai and there is a winning y G A 2 such that 
{x,y) G M, ov X G A 2 and for all {x,y) G M, y is winning. The Game Problem 
is the question whether s is winning. 

Corresponding to this, we define a transition system 7 g = (Ai U A 2 , A) by 
A = (M — {(p, G M I p G ITo}) U {(p, p) G Ai X Ai | p ^ Wq and there is no 
transition from p in M}. 

A is defined in the way that every deadlock state, i.e. state with no outgoing 
edges, is a state of Wq or a state of P 2 in which Player 2 is not able to move. 
Hence, deadlocks are winning. They can be characterised in the //-calculus by 
piY^ = [— ]f alse where a formula [—]p indicates that p is satisfied in all successor 
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states. Further winning positions for Player 1 are states of Pi such that there is 
a successor state (in P 2 ) whose direct successors (in Pi) are all winning. Hence, 
the formula Lp — fiX .{{—)[— ]X V p>Wo) is satisfied in exactly those positions of Pi 
which are winning where {—)p> guarantees the existence of a successor state in 
which (p holds. Note that p may be satisfied in further positions of P 2 which does 
not bother us. We conclude that s is winning in the game G — (Pi, P 2 , M, IFb, <s) 
iff 7 g, s Observe that the construction of the transition system can be 

done in LOGSPACE. Note, we do not make use of propositions. Furthermore, 
we manage without actions at all by slightly adapting the modal fragment of 
our logic. 

Together with a lineartime algorithm [12,13], we have the following theorem: 

Theorem 1. Model checking the alternation free p-calculus is inherently se- 
quential wrt. the combined complexity as well as the program complexity, 

3 Games for the |/-Calculus 

Given a labelled transition system T = (A, P, Al,so) and a formula p over Var 
and A, we are able to define the model checking game. Its board is the Cartesian 
product S X Suh{p) of the set of states and the set of subformulae. The game 
is played by two players, namely Vbelard (the pessimist), who wants to show 
that T, So p does not hold, whereas Bloise (the optimist) wants to show the 
opposite. 

The model checking game G(s, p) is given by all its plays, i.e. (possibly 
infinite) sequences Cq Ci C 2 ~^p 2 ... of configurations, where Cq = 
(s, p) and for all i, Ci ^ S x Sub{p) and pi is either Bloise or Vbelard. We write 
^ instead of -^p^ if we abstract from the player. The players do not have to move 
alternately, instead, the next turn is determined by the current subformula of 
p. Hence, the second component of a configuration Ci determines the player 
Pi who is to choose the next move. Vbelard makes universal ^y-moves, Bloise 
makes existential ^ 3 -moves. More precisely, whenever 

1 . Ci = (s, false), then the play is finished. 

2 . G = (s, A ' 02 ), then Vbelard chooses p = ov p = ^ 2 ^ and = (s, p). 

3 . Ci = (s, [K]fi), then Vbelard chooses s -% t with a G K and = 

4. Ci — {s, nX.fi), then = {s,fi). 

5 . Ci — (s,true), then the play is finished. 

6 . Ci — (s, fii V ' 02 ), then Bloise chooses p — fii or = 02 , and = (s, p). 

7. Ci — (s, (/V)0), then Bloise chooses s t with a G K and = (tX)- 

8 . C'i = then 0+i = {s,ip)- 

9 . Ci = (s,X) and X identifies p, then = (s,(^). 

As the moves 1,4, 5 , 8 and 9 are deterministic, no player needs to be charged with 
them. With regard to the winning strategies and the algorithm, we will speak of 
Mbelard-moves in cases 1-4 and 9 if cr = //, and Moise-moves in all other cases. 
Ci is called V-configuration or 3-configuration, respectively. 

Vbelard wins a play G, iff 
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— G — Cq ^ ^ Cn and Cn — (s, false) for any state 5 . 

— G — C{) ^ ^ Cn and Cn — (s, {K)^) and : s t for any a ^ K . 

— G = Co ^ • has infinite length and the outermost fixpoint which is un- 

winded infinitely often is a //-fixpoint. 

Dually, Bloise wins a play G, iff a configuration with the formula true is reached, 
Vbelard gets “stuck” , or the outermost fixpoint which is unwinded infinitely often 
is a //-fixpoint [24]. 

Please note that, given a transition system and a formula, there are several 
plays and these not necessarily have the same winner. 

To characterise plays, we introduce the notion of witnesses and judgements. 
Configurations C in which no move is possible are called judgements. A judge- 
ment is further called 3-judgement (V-judgement) iff it is an 3-configuration 
(V-configuration) . For finite plays, it is obvious that 31oise (Vbelard) wins the 
play iff it contains an 3-judgement (V-judgement). 

Configurations of the form (s. A) where s is any state of the transition sys- 
tem and A is a variable are called witnesses since in the following move, A is 
unwinded. If A is bound by a //-quantifier, it is called an V-witness, otherwise 
3-witness. Witnesses have a natural partial order given by the nesting within 
the originating formula. A witness (s. A) is less than (s,Y) iff crX.(f{X,Y) is 
a subformula of YY.ip{X,Y) where ? 7 (A,y) means that g may contain the free 
variables A,Y. For infinite plays, it is easy to see that it has a unique maximal 
witness and that the winning condition can be formulated as Mbelard wins iff 
this witness is an M -witness^ 3/o/se wins iff it is an 3-witness. Please note that 
no configuration can be a judgement as well as a witness. 

A strategy is a set of rules for a Player p telling her or him how to move in the 
current configuration. A winning strategy now guarantees that the play which p 
plays regarding the rules will be won by p. [24] shows that the model checking 
problem for the //-calculus is equivalent to finding a winning strategy for one of 
the players: Let T be a transition system with starting state s, and let be a 
//-calculus formula. Y, s p implies that 31oise has a winning strategy starting 
at (s, (f ), and Y, s ^ (p implies that Vbelard has a winning strategy starting at 
{s,p). Since a formula either holds or is falsified, this result also implies that 
model checking games are determined^ i.e., for every game, either Vbelard or 
31oise has a winning strategy. 

All possible plays for a transition system Y and a formula p are captured 
in the game graph whose nodes are the elements of the game board (the possi- 
ble configurations) and whose edges are the players’ possible moves. The game 
graph can be understood as an and-/or-graph where the or-nodes (denoted by 
Y) are 3-configurations and and-nodes (denoted by f\) are V-configurations. 
Furthermore, the notion of witnesses and judgements carries over without any 
modification. A play corresponds to a path in the game graph and vice versa. 

In the following, we concentrate on the alternation free //-calculus. The fol- 
lowing characterisation of the game graph for this fragment is useful for formu- 
lating a sequential algorithm and essential for our parallel algorithm. 
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Theorem 2. Let T he a labelled transition system and ip a formula of the alter- 
nation free p-calculus. Furthermore, let {Q,E) be their game graph. Then there 
exists a collection of Qi, . . . , Qm such that the following holds: 

1. the collection of the Qi is a partition of Q, i.e., Q — Uie{i m} 

all i, j ^ {1, , m} with i j , it holds Qi C\Qj = 

2. The subgraph induced by Qi is exactly one of 

(a) a non-trivial maximal strongly connected component (Type I).‘^ 

(b) a singleton containing a judgement (Type II). 

(c) a maximal directed acyclic graph without any judgements (Type III). 

3. every Qi of Type I either contains at least one 3-witness and no M -witnesses 
or contains at least one \! -witness and no 3-witnesses. 

4 . there is a partial order < on the Qi ^s such that for every q G Qi and q' G Qj 
with an edge from q to q) we have Qj < Qi. Thus, moves from a configura- 
tion in Qi lead to configurations in either the same Qi or a lower Qj. 

Froof. The proof is inspired by a characterisation of the game graph in terms of 
weak alternating antomata [16]. First, consider the nodes of maximal non-trivial 
strongly connected components. These only occnr becanse of nnwinding fixpoint 
formnlae. Hence, they contain witnesses. Alternation freeness now gnarantees 
that these are all of the same kind, i.e. either 3-witnesses or V-witnesses. Second, 
consider the leaves of the game graph, i.e. confignrations withont ontgoing edges. 
These cannot be members of strongly connected components of Type L By 
definition, every snch Qi is a jndgement. Third, it is now easy to see that all 
remaining nodes belong to directed acyclic graphs which do not contain any 
leaves of the original game graph. Please note that maximality of the strongly 
connected components gnarantees the order defined to be a partial order. 

To prove onr parallel algorithm to be free from deadlock, we need the follow- 
ing insight which holds since on every cycle, a fixpoint formnla is nnwinded. 

Proposition 1. Every strongly connected component of a game graph with more 
than one element contains at least one witness. 

Let us sketch a sequential algorithm deciding which player has a winning 
strategy [12]. It labels a configuration q by green or red, depending on whether 
Bloise or Vbelard has a winning strategy for the game starting in this configura- 
tion q. It will be extended to a parallel version in the next section. 

Let us consider a game graph. By Theorem 2, there exists a partition of its 
nodes Q into disjoint Qi of Type I-III, and every Qi of Type I either contains 
3-witnesses or V-witnesses. Also, there exists a partial order < on the Qi such 
that for q G Qi and G Qj for which there is a possible move from q to q) we 
have Qj < Qi. As seen before, every infinite play gets trapped within a single 
Qi, and the winner depends on whether Qi contains 3-witnesses or V-witnesses. 
By Prop. 1, every infinite play visits such a witness infinitely often. 

The game graph can be coloured by processing all Qi upwards according to 
the partial order. To make the algorithm deterministic, enlarge the partial order 

^ A component is called non-trivial if it contains a least two nodes. 
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on the Qi to a total order. Let Qi be minimal wrt. <. Then it is of Type I or 
Type II. Furthermore, from any configuration of every move leads to Q{. If 
in Qi there is an 3-witness or Qi consists of an 3-judgement, all its nodes are 
labelled with green ^ otherwise red. 

Once a configuration q G Qi is labelled with red or green ^ its predecessors 
are labelled if possible. That means, an /\-node is labelled with red if q is red^ 
but labelled green ^ if all successors are green. An \J-node is treated dually. 
Furthermore, if a node could be labelled, its outgoing edges are erased. Such 
labelling is propagated further. 

Let Qj be the next set of configurations wrt. the total order. Then, all con- 
figurations in Qi < Qj are already coloured by either red or green. If Qj is of 
Type III, all of its configurations must be labelled due to the propagation de- 
scribed before. For a set of Type I, some unlabelled configurations might remain. 
These are labelled according to the type of witnesses in Qj, i.e. with red if Qj 
has V-witnesses, otherwise with green. 





Fig. 1. State graph (a) and its partition (b) 

Let us consider Figure 1. Part (b) is a partition of the game graph shown 
in (a). It contains components of types I, II, and III. Qi i® ^ directed acyclic 
graph, Q 2 a non-trivial maximal strongly connected component, and Qs i® ^ 
singleton containing a 3-judgement (denoted by +). The components can be 
ordered like Qs < Q2 < Qi- Since the minimal Qs contains an 3-judgement, 
it will be labelled with green. The Y-node from Q 2 to Qs and subsequently 
all nodes will therefore be labelled with green as well. Note how the V-witness 
(denoted by “) has no influence here, since all nodes are coloured already. 

4 Parallel Model Checking 

Given a transition system and an Tj^-formula, our approach is both to construct 
the game graph as well as to determine the colour of its nodes in parallel. 



4.1 Distributing the Game Graph 

It is obvious that the construction of the game graph can be carried out in parallel 
by a typical breadth-first strategy. Given a node q, determine its successors 
qi, . . . ,qn - Now, the successors can be processed in the same manner in parallel. 





Parallel Model Checking for the Alternation Free Calculus 



551 



However, to obtain a terminating procednre, only exactly the qi not processed 
before mnst be expanded. All states generated have to be stored within the 
NOW, and load sharing mnst be gnaranteed. On a shared memory architectnre, 
this does not involve big conceptnal problems. For distribnted memory machines, 
however, this is a little bit more difhcnlt. 

A first idea might be to distribnte the first qi, . . . , Qn to the first n processors, 
and these process the qi as described before and distribnte the snccessors to the 
next processors. However, deciding whether a qi was processed before becomes 
an expensive operation. Every processor conld have processed qi and shonld 
therefore be consnlted. In the worst case, for every node, snch a broadcast is 
reqnired. This yields no reasonable algorithm. 

A different, often-employed way to store graphs on a distribnted memory 
machine is to divide the graph A adjacency matrix M G into eqnal 

sized blocks and to store each block on a single processor [22]. This has several 
advantages. First, the blocks of the matrix can be generated in parallel. Second, 
given nodes p,q ^ V, it is easy to check whether there is an edge from p to q, i.e. 
whether Mp^q = 1. Since there is a nniqne location for the block of the matrix 
containing the valne for the pair (p, q), a single communication is needed. Third, 
every processor gets the same amount of data. 

For our problem, this approach cannot be used. The number of nodes of our 
graph is unknown a priori but computed while constructing the graph. ^ Hence, 
the partition of the game graph into blocks cannot be determined in advance. 

We propose the following way to construct and store the graph which is in- 
spired by the work pool presentation of [15] and is similarly applied in [23]. Let / 
be a function mapping the states of the game graph to a processor of our network. 
Usually, one takes a function in the spirit of a hash function assigning to every 
state an integer and subsequently its value modulo the number of processors. 
Then, / determines the location of every state within the network uniquely and 
without global knowledge. In a breadth-first manner, starting with the initial 
state qo of the game graph, the state space can be constructed in parallel with 
the help of / in the following way. Given a state q (and possibly some of its direct 
predecessors), send it to its processor f{q). If q is already in the local store of 
f{q)^ then q is reached a second time, hence the procedure stops. If predecessors 
of q were sent together with q, the list of predecessors is augmented accordingly. 
If q is not in the local memory of /(^), it is stored there together with the given 
predecessors as well as all its successors . . . , the states within the formula 
6{q^a) which are computed. These are sent in the same manner to their (wrt. 
/) processors, together with the information that g' is a direct predecessor. The 
corresponding processes update their local memory similarly. 

4.2 Labelling the Game Graph 

Given the game graph, a first possibility for labelling the nodes with red and 
green would be to apply a depth-first-search as done by sequential algorithms. 

^ In the context of model checking, the transition system is not given explicitly but 
expanded at run-time from a formal system description. 
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However, since this problem is P-complete [21], there is no hope to get a snitable 
parallel algorithm by adapting the ideas of depth-first-search, and there is no 
reason to do this. 

Another possibility for labelling the nodes with red and green is applying 
typical algorithms for generating strongly connected components of onr game 
graph in parallel which will be labelled in a second step [22]. However, labelling 
a connected component reqnires the knowledge whether the component can be 
“snccessfnlly abandoned” (cf. Section 3). It is neither clear how to obtain this 
information from the graph compnted by these algorithms nor to modify these 
algorithms in the way that this information can be gathered easily. 

We therefore propose the following method. The parallel colonring process 
is carried ont speculative. For example, given an 3-witness cp it determines an 
accepting component. This node is colonred green nnless there is an /\-node 
with an edge to a lower component which is coloured red. The parallel algorithm, 
however, labels the witness q with green. Furthermore, a notification is sent to 
its direct predecessors qi,. . .,qk’ This notification tells each qi that q changed 
its colour. Hence, they recompute their own colour according to the following 
obvious rule: If qi is an \/-node, then it is labelled with green if one of its 
successors is green. If all successors are red^ it is labelled with red. Otherwise, 
some successors have no label yet and no colour is assigned to the current node. 
For an /\-node, the dual is carried out. If the colour of qi has changed, it sends a 
notification to its predecessors where the same procedure starts again. Otherwise, 
the procedure is done. It is clear that the predecessors can be processed in 
parallel. The whole algorithm stops if all notifications are processed. 

Theorem 3. The algorithm described before labels a node (s, ip) of the game 
graph with green ifT^s |= f). Otherwise, the node is labelled with red. 

Proof. We give a sketch of the proof. The termination of the algorithm can 
easily be seen by recalling that the game graph can be divided into components 
as aforementioned which are partially ordered. Labelling notifications are sent 
either within the current component or propagated to a higher one (wrt. the 
partial order). Since the colour operation is monotone (in the obvious sense), 
only a finite number of labelling notifications is generated. 

When the algorithm terminates, the game graph is entirely labelled (com- 
pleteness), because the components are either leaves (in which case they are la- 
belled as described before), they contain a witness (cf. Proposition 1), in which 
case the speculative part of the algorithm jumps in and starts the labelling of 
this component, or completely depend on a lower component. 

The sequential algorithm labels the nodes with a correct colour by processing 
the mentioned components according to the partial order. The labelling remains 
correct if the labelling of higher (wrt. the partial order) components is done 
speculative and corrected as soon as the correct colour of the lower component is 
determined. Note that the colour of leaves and leaf components (i.e. components 
which are minimal for the partial order) are correctly labelled from the beginning. 

It should be mentioned that for an implementation, the two steps of con- 
structing the game graph and labelling the nodes are carried out concurrently 
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(cf. Section 4.3). The combined algorithm stops if no fnrther labelling or state 
expansion steps have to be processed. For this task, we employ the DFG token 
termination algorithm, as presented in [5]. Dne to space constraints, we will not 
go into the details of this algorithm. 

4.3 The Algorithm 

To describe onr approach in more detail, we show an algorithm in pseudo code 
which combines the task of constrncting and labelling the game graph in parallel. 
The termination check is omitted to simplify the presentation. This algorithm 
rnns on every machine within the NOW. Note that initially one machine has to 
send the root of the game graph to its processor to start the procednre. 

Let ns consider Fignre 2. Each processor A part of the graph is stored there in 
a relational strnctnre consisting of one tnple for every node already processed of 
the form (node , colour , preds , succs ) where node is the node together with 
its colour, its predecessors preds, and its snccessors together with their colonrs 
(succs, line 3). We introduce the colour white to denote unlabelled nodes. In 
line 7, a message is received. If the message requests to expand a given node, it 
is checked whether this node has already been processed before (line 10). If so, 
it might have a colour (not equal to white) which will be propagated to the new 
predecessors (lines 12-13). In any case, new predecessors are stored (line 11). 
If the delivered node has not been processed before (line 14), its successors are 
computed (with an initially white colour) and propagated to the corresponding 
processors (lines 15-18). Furthermore, it must be checked whether a labelling 
process must be initiated, i.e. whether the current node fulfils the requirements 
for being either an 3- or an V-witness. If so, all predecessors are informed about 
the current node A colour (lines 19-25). 

The second type of messages which are received are colouring messages (line 
26) informing that a nodeA child has changed its colour. The current settings 
for node are extracted (line 27), and the new colour of the corresponding child 
is stored (line 28). If now the evaluation of the node A colour according to Sec- 
tion 4.2 yields a different colour than the old one, then all predecessors are 
informed in the previous way (lines 29-33). 

It is easy to see that the space required by our algorithm is linear in the 
size of the game graph. The worst case run-time, however, is a factorial of its 
size. This case might turn up when the components of the game graph are of 
Type II, linearly ordered and alternating, i.e. Qi contains an 3-witness and 
contains an V-witness. Now, in every component a (speculative) recolouring up 
to the maximal (root) component may be initiated. 

Since we already observed that the model checking problem for the consid- 
ered fragment of the //-calculus is unlikely to be in NC, we were warned that a 
parallel algorithm might not be optimal in every case. Despite these theoretical 
limitations, in practice the behaviour turns out to be feasible (cf. Section 5). An 
explanation for this fact is that the aforementioned kinds of game graphs require 
formulae with deeply nested fixpoints, which rarely occur as typical specifica- 
tions. 
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Process P 

// graph : Node — ^ (Colour x [Node] x [(Node^Colour)]) 

begin 

until hasTerminated do 
msg ^ readMessage; 
case msg of 

Expand node pred: 

if (node, colour, oldpreds , _) in graph then 
addPreds(node, preds) 

if colour / White and pred ^ oldpreds then 

sendMessageTo (Colour pred node colour) f(pred) 

else 

succs ^ computeSuccs(node) 
succss [ ( s , White) | s in succs ] 
for s G succs do 

sendMessageTo (Expand s [node]) f(s) 
colour case 

node is 3 — witness or 3— judgement: Green 
node is V— witness or V— judgement: Red 
else: White 

addGraph (node, colour, preds, succss) 
if colour / White 

sendMessageTo (Colour pred node colour) f(pred) 
Colour node child colour: 

(node, oldcolour , preds, succs) in graph 
updateSucc (succs, child , colour) 
newcolour ^ computeColour(node, succs) 
if newcolour / oldcolour then 
up date Graph (node, newcolour) 
for p G preds do 

sendMessageTo (Colour p node newcolour) f(p) 

end 

Function computeColour (node, succs) 

begin 

case 

node is \/“iiode: 
case 

all (= Red) succs: Red 
any (= Green) succs: Green 
else : White 
node is /\— node: 
case 

all (= Green) succs: Green 
any (= Red) succs: Red 
else : White 

end 



Fig. 2. A parallel construction and labelling algorithm 
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Let us consider the graph in Figure 3 as a sim- 
ple example. It suggests that the distribution func- 
tion / will map the given nodes onto two proces- 
sors as shown. Starting with node 1, its successors 
(2, 5) are computed and sent to processors pi and 
P 2 , resp. Now, 2 and 5 can be expanded in parallel 
with the effect that nodes 3 and 6 are sent to p 2 - 
The successors of 3 are 5 and 4 which are delivered 
to p 2 . Since 6 is an V-witness (indicated by “), it is 
labelled with red^ initiates a relabelling of 5, and is- 
sues an expand-2-message to pi , which notices that 
2 is already expanded and registers 6 as one of 2 A predecessors. p 2 carries on 
with expanding 4, noticing that it is an 3-judgement (indicated by and sends 
this information to 3. Next, the red-labelling of 5 is propagated to 3 and 1. How- 
ever, 3 can now determine green as its colour and sends it to its predecessor. 2 
propagates green to 1 and 6. Finally, the whole graph is labelled green. 

5 Experimental Results 

We have tested our approach within our verification platform Truth [14]. We 
implemented the distribution routine on its own as well as the combined labelling 
routine described in the previous section. As implementation language we have 
chosen the purely functional programming language Haskell^, which enabled us 
to embed this algorithm in the infrastructure of our verification tool Truth and 
also to prototype a concise reference implementation. The actual Haskell source 
code of the algorithm has less than 300 lines of code. The communication layer 
of our implementation is based upon MPICH^, an implementation of the MPI 
(Message Passing Interface) standard. 

Now we will show some results we achieved when verifying properties of cer- 
tain system specifications. Figures 4 and 5 show the measured results of state 
distribution and the speedup when running our implementation on a NOW con- 
sisting of up to 52 processors and a total of 13GB main memory. They are 
connected with a conventional 100MBit Fast-Ethernet network. 

The distribution routine shows that our approach is very well suited for 
constructing large game graphs. We were able to construct graphs with several 
hundred thousands of states within minutes. The game graph of the largest ex- 
ample we have constructed so far, a quad-parallel instance of the Alternating Bit 
Protocol [18], consists of more than 1.6 million states, and we get a homogeneous 
distribution of the state space on the workstations (Figure 4). The distribution 
depends on our hash function /, and the results are quite good compared to 
its simplicity. In fact, all our measurements showed similar results, provided the 
size of the graph is reasonably larger than the number of used workstations. 

Our approach also scales very well with regard to the overall runtime (Fig- 
ure 5). Unfortunately, because of the size of the game graphs we inspected, we 

^ http://haskell.org/ 

^ http : //¥¥¥-unix . mcs . anl . gov/mpi/mpich/ 
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Fig. 4. state distribution 




did not get results when running the algorithm on less than five workstations 
due to memory restrictions. Therefore the shown speedups are calculated rel- 
ative to 5 processors instead of one. We found that we gain a linear speedup 
for reasonably large game graphs (in fact, for graphs with more than 500.000 
states we even got superlinear speedups, which we will discuss later). The results 
are especially satisfying, if one considers that — for reasons of simplicity — we did 
not try to employ well-known optimisation means, for example reducing the 
communication overhead by packing several states into one message. 

Due to our choice of Haskell as implementation language and its inherent 
inefficiency, we did not focus on optimising the internal data structures either. 
We use purely functional data structures like balanced trees and lists rather than 
destructively updateable arrays or hash tables. This is also the reason for the 
superlinear speedups we remarked before. We found that the overhead for inser- 
tions and lookups on our internal data structures dramatically increases with the 
number of stored states. We verified this by running all processes on a single pro- 
cessor in parallel and replacing the network message passing with inter-process 
communication. The expected result would have been to find similar runtimes 
as one process would achieve in this situation, or even slightly worse due to op- 
erating system context switches between the processes running in parallel. But 
we found that there is a significant speedup because the internal data structures 
are less crowded so that lookups and insertions are considerably cheaper. 

Comparing our approach to the implemented sequential game-based depth- 
first-search model checking algorithm for we have to learn that it is not 
possible for small examples to beat it. There are two reasons for that. In many 
cases, a formula can be proven or falsified by considering only a part of the game 
graph. Even parallel power does not outperform in these cases. Second, the com- 
munication between processors is dramatically slower within a NOW compared 
to accessing memory. Hence, as long as a problem fits into main memory, it is 
difficult to beat a sequential algorithm by a parallel one running on a NOW. 

The situation changes completely when most of a huge game graph has to be 
checked for proving a formula. This situation arises for example in the frequent 
case that a NoDeadlock formula is considered. To check whether a system de- 
scription contains any deadlock requires the whole game graph to be analysed. 
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For large systems (several hundred thousand states), the parallel version beats 
the sequential one. More important, we were able to verify certain systems with 
the help of our parallel algorithm while the sequential failed due to memory 
restrictions. 

6 Conclusion 

In this paper, we presented a parallel game-based model checking algorithm for 
an important fragment of the //-calculus. The demand for parallel algorithms 
becomes visible by considering the memory and run-time consumptions of se- 
quential algorithms. Since the employed fragment of the //-calculus subsumes 
the well-known logic CTL, it is of high practical interest. We have implemented 
our approach within the verification platform Truth. Systems with a million of 
states could be constructed within half an hour on a NOW consisting of up to 52 
processors. We found out that the algorithm scales very well wrt. run-time and 
memory consumption when enlarging the NOW. Furthermore, the distribution 
of states on the processors is homogeneous. 

Compared to the also implemented on-the-fly sequential model checking 
algorithm for , we learned that for simple examples a parallel global algorithm 
cannot outperform a local one. This is especially true for formulae like true V ip 
being checked, where p — yields a considerably big part of the resulting 

game graph. A local algorithm would be able to almost instantaneously present 
a solution, since the formula is dominated by an 3-judgement (true). 

To improve our algorithm for such cases, we head towards an “almost locab^ 
variant, which not only uses two colours but colour weights^ with which e.g. 
the propagation of safe colours (resulting from minimal components Qi) can be 
tracked better. So we eventually short-circuit the colouring process. 

However, considering real world specifications yielding millions of states, even 
the here presented parallel algorithm gains the upperhand. Answers are com- 
puted more quickly, and, more important, there are numerous cases in which 
the sequential algorithm fails because of memory restrictions and the parallel 
version is able to prove a formula. From the practical point of view, it is a cen- 
tral feature of a verification tool to give an answer in as many cases as possible. 

While our approach is already of practical interest since it allows to check 
larger systems, it should also be considered as a further attempt to develop 
parallel model checking algorithms. More research should be carried out in this 
direction. Especially, on-the-fly model checking and partial order reduction [20] 
should be analysed with respect to parallelisation. Furthermore, different (espe- 
cially non-P-complete) specification logics might provide better parallel model 
checking algorithms. 
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Abstract. We define a logic called CTL*[DC] which extends CTL* with 
ability to specify past-time and quantitative timing properties using the 
formulae of Quantihed Discrete-time Duration Calculus (QDDC). Al- 
ternately, we can consider CTL*[DC] as extending logic QDDC with 
branching and liveness. 

As our main result, we show a reduction of CTL*[DC] model checking 
problem to model checking of CTL* formulae. The reduction relies upon 
an automata-theoretic decision procedure for QDDC. Moreover, it pre- 
serves the subsets CTL and LTD of CTL*. The reduction is of practical 
relevance as model checking of CTL* as well as its subsets CTL and LTL 
are well studied and even implemented into a number of tools. We briefly 
discuss an implementation of a model checking tool for CTL [DC] called 
CTLDC, based on the above theory. CTLDC can model check SMV, Ver- 
ilog and Esterel designs using tools SMV, VIS and Xeve, respectively. 



1 Introduction 

Logic CTL* is an expressive logic for the specification of properties of transition 
systems [8]. It has path qnantifiers for specifying branching time properties as 
v^ell as temporal operators for specifying ho^v state of the system evolves along 
execntion paths. For example, the following formula states that on all execution 
paths proposition P will hold infinitely often. (We only provide an intuitive 
explanation of what the properties states. A precise definition of the syntax and 
semantics of CTL* operators is given in Section 3.) 

AGF P 

Model checking algorithms for verifying CTL* properties of finite state transition 
systems are well studied [8]. Moreover, subsets CTL [4,5] and LTL [25] of CTL* 
have also been formulated and thoroughly investigated. Symbolic model checking 
algorithms for verifying formulae of these sub-logics have been implemented in 
tools such as SMV [18], VIS [2] and TLV [12]. 

In spite of its expressive abilities, there are situations where CTL* is re- 
strictive. It has long been recognised [17] that availability of past modalities in 

* Partially supported by the UNU/IIST offshore project Semantics and verification of 
real-time programs using Duration Calculus: Theory and Practice 

T. Margaria and W. Yi (Eds.): TACAS 2001, LNCS 2031, pp. 559-573, 2001. 
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temporal logics considerably facilitates formulation of complex properties. Sec- 
ondly, specifLcation of reactive systems must often deal with quantitative timing 
constraints [9]. In this paper, we will address these issues in an integrated fashion. 

Discrete-time Duration Calculus (QDDC) [21,22] is a highly expressive logic 
for specifying quantitative timing properties of finite sequences of states (be- 
haviours). It is closely related to the Interval Temporal Logic of Moszkowski 
[19] and Duration Calculus of Zhou et al [26]. It provides novel interval based 
modalities for describing behaviours. For example, the following formula holds 
for a behaviour a provided for all fragments cr^ of a which have (a) P true in 
the beginning, (b) Q true at the end, and (c) no occurrences of Q in between, 
the number of occurrences of states in where R is true is at most 3. 

Here, □ modality ranges over all fragments of a behaviour. Operator ^ is like 
concatenation (fusion) of behaviour fragments and [[“><3] states invariance of 
-iQ over the behaviour fragment. Finally, UR counts number of occurrences of 
R within a behaviour fragment. A precise definition of the syntax and semantics 
of QDDC is given in Section 2. Formula 77 = 3 states that the behaviour frag- 
ment has length 3 (i.e. it spans a sequence of 4 states). QDDC is a convenient 
and highly expressive formalism for specifying quantitative timing properties. 
However, it cannot specify liveness or branching. 

An automata-theoretic decision procedure allows checking of satisfiability 
(validity) of QDDC formulae [22]. This algorithm has been implemented into 
a tool called DC VALID [21]. The tool is built on top of MONA [11], which 
is an efficient and sophisticated HDD-based implementation of the Buchi-Elgot 
automata-theoretic decision procedure for Monadic Logic over Finite Words [3,7]. 
(See [13] for a recent paper on MONA.) 

In this paper, we propose a straight-forward extension of CTL* where, in 
place of propositions, formulae of Quantified Discrete-time Duration Calculus 
QDDC can be asserted within CTL* formulae. A QDDC formula D holds for 
a node of a computation tree provided the unique path from the root of the 
tree to the node satisfies D, Thus, a QDDC formula D allows specification of 
the “past” of the node. Operators of CTL* allow specification of branching and 
liveness properties. 

For example, the following formula states that on all execution paths QDDC 
formula D will become true infinitely often. 

AGF D 

The following formula states that once there is overload for 5 steps, there will 
be alarm until reset occurs. 

AG {true '^ { \\ Overload] A 7] = b) ^ A{alarm Id reset)) 

Logic QDDC provides a useful extension to the expressive power of CTL* 
by allowing past-time and quantitative timing properties to be expressed. It also 
significantly increases the expressive power of QDDC which is unable to specify 
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liveness properties such as infinitely often D or branching. In a separate note [24], 
we list a number of real-life properties collected from model checking literature 
which are stated to be hard to formulate in CTL. We show that these properties 
can be easily captured using CTL [DC]. 

As our main result, we show a reduction of CTL* [DC] model checking prob- 
lem to the model checking of pure CTL* formulae by effectively transform- 
ing the transition system and the property. This transformation relies on the 
automata-theoretic decision procedure for QDDC. Thus, in effect, we show that 
by combining the model-checking procedures for CTL* and QDDC, we obtain a 
model-checking procedure for CTL* [DC]. 

Our reduction of CTL* [DC] model checking to CTL* model checking pre- 
serves the subsets CTL and LTL of CTL*. That is, a CTL[DC] formula reduces 
to a CTL formula and LTL [DC] formula reduces to an LTL formula. This reduc- 
tion is of practical relevance as model checking of CTL* [8] as well as its subsets 
CTL [5] and LTL [16] are well studied and even implemented into a number of 
tools such as SMV [18,6], VIS [2] and TLV [12]. Based on this reduction, we have 
implemented a model checking tool for CTL [DC] called CTLDC [23]. It permits 
model checking of SMV, Verilog and Esterel designs using SMV [18], VIS [2] and 
Xeve [1] tools, respectively. 

CTLDC permits well established CTL model checking tools to be used for 
analysing complex properties involving past and quantitative timing constraints. 
While there have been several theoretical formulations extending LTL and CTL 
with past [14,15], CTLDC constitutes perhaps the first implementation of CTL 
with past. Moreover, the fact that we can integrate our approach with a wide 
variety of design notations such as SMV, Verilog, Esterel, and tools such as 
SMV, VIS, Xeve shows that the approach is rather generic and easy to build 
from components. 

The rest of the paper is organised as follows. We provide a brief overview of 
the logic QDDC in Section 2. The syntax and semantics of CTL* [DC] are given in 
Section 3. The reduction from model checking of CTL* [DC] to model checking 
of CTL* is given in Section 4. Einally, some examples of use of CTLDC, the 
model checker for CTL [DC], are described in Section 5. We conclude the paper 
with a brief discussion. 

2 Quantified Discrete-Time Duration Calculus (QDDC) 

Let Pvar be a finite set of propositional variables representing some observable 
aspects of system state. Let 

VAL(Pvar) Pvar {0, 1} 

be the set of valuations assigning truth- value to each variable. 

We shall identify behaviours with finite, nonempty sequences of valuations, 
i.e. V AL[Pvar)^ . 
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Example 1. The following picture gives a behaviour over variables {p^q}. Each 
column vector gives a valuation, and the word is a sequence of such column 
vectors. 

p 10 110 

q 0 0 0 0 1 

The above word satisfies the property that p holds initially and q holds at the 
end but nowhere before that. QDDC is a logic for formalising such properties. 
Each formula specifies a set of such words. 

Given a non-empty finite sequence of valuations a G V AL^ ^ we denote the 
satisfaction of a QDDC formula D over cr by 

(j 1= D 

We now give the syntax and semantics of QDDC and define the above satisfaction 
relation. 

Syntax of QDDC Formulae Let Pvar be the set of propositional variables. Let 
p range over propositional variables, T, Q over propositions and D, Di, D 2 over 
QDDC formulae. 

The set of propositions Prop has the syntax 

0 \ I \ p \ P AQ \ ^P 

Operators such as V, can be defined as usual. 

The syntax of QDDC is as follows. 

\P]° I [[PI I D1-D2 I P1AP2 I -P I 3 p-D 

p op c \ EP op c where op G {>, =} 

Let (j G V AL{Pvar)'^ be a behaviour. Let ^cr denote the length of a and 
cr[i\ the Pth element. Eor example, A cr — vi^ V 2 ) then ^cr — 3 and a[l] = Vi. 
Let dom{cr) = {0, 1, . . . , # 0 " — 1} denote the set of positions within cr. The set 
of intervals in a is given by Intv{cr) = {[6,e] G dom{a)‘^ \ h < e} where each 
interval [6, e] identifies a subsequence of a between positions b and e. 

Let (j, i 1= T denote that proposition P evaluates to true at position i in a. 
We omit this obvious definition. We inductively define the satisfaction of QDDC 
formula D for behaviour a and interval [b,e] G Intv (a) as follows. 

cr, [6, e] 1= iff 6 = e and cr, 6 |= T 

cr, [6, e] 1= [[T] iff b < e and cr, i |= T for all i : b < i < e 

cr, [6, e] 1= -iD iff cr, [6, e] ^ D 

cr, [6, e] 1= Di A D 2 iff cr, [6, e] |= D\ and cr, [6, e] |= D 2 

a, [b, e] 1= Di ^ D 2 iff for some m : b < m < e : 

cr^ [6, m] 1= Di and cr, [m, e] |= D 2 
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Entities r] and UP are called measurements. Term r] denotes the length of the 
interval whereas UP denotes the count of number of times P is true within the 
interval [b,e] (we treat the interval as being left-closed right-open). Formally, 



eval{ri, cr, [6, e]) = e — h 
eval{UP, cr, [b, e]) | 



1 if cr, i 1= _P 1 

0 otherwise J 



Let t range over measurements. Then, 



a, [b, e] t op c iff eval{t, cr, [6, e]) op c 

Call a behaviour a' to be p-variant of a provided ^<j — and for all i E 
dom{a) and for all q ^ p, we have cr[i)[q) = cr^{i){q). Then, 

(j, [b,e] 1= 3p.D iff ,[b,e] D for some p- variant of a 

Finally, 

cr 1= iff cr, [0, — 1] \= D 

We can also define some derived constructs. Boolean combinators 
can be defined using A, -i as usual. 

— \\P]\ ([[T] ^ states that proposition P holds invariantly over the 

closed interval [6, e] including the endpoint. 

— [ ] [1]^ holds for point intervals of the form [b,b], 

— ext -il" ] holds for extended intervals [6, e] with b < e. 

— unit ext A ^{ext ext) holds for intervals of the form [6, b 1], 

dcf 

— OD = true^ D ^ true holds provided D holds for some subinterval. 

dcf 

— UD — -lO-iD holds provided D holds for all subintervals. 

— / >c*=^t = cVt>c. Also, t < c -i(t > c). 

— O'" {3p. {\p]^ ^true^ \p]^)A 

vM)-[pr) ^ D)) 

Formula D* represents Kleene-closure of D under the ^ operator. It states that 
D* holds for interval [6, e] if there exists a partition of [6, e] into a sequence of 
sub-intervals such that D holds for each sub-interval. Each sub-interval is char- 
acterised by p holding at both endpoints and nowhere in between, i.e. satisfying 
the formula ([p]^ '^unit^ {\\^p] V [ ])^|"p]^)- It is not difficult to see that 

cr, [6, e] 1= iff b — e V b < e and 3n, 6o, • • • , ^n- 

(6 = bo and VO < i < n. C' < and bn — e and cr, [C-, C+i] |= D) 
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Decidability of QDDC The following theorem characterises the sets of models 
of a QDDC formula. Let pvar{D) be the finite set of propositional variables 
occurring within a QDDC formula D. Let V AL{Pvar) = Pvar {0, 1} be the 
set of valuations over Pvar. 

Theorem 1. For every QDDC formula D, we can effectively construct a finite 
state automaton A[D) over the alphabet V AL[pvar[D)) such that for all a E 
VAL{pvar{D)y , 

cr^D iff cr G L{A{D)) 

We omit the proof of this theorem as it can be found elsewhere [22]. In outline, 
the proof relies on the following steps. Firstly, we can eliminate all measurement 
formulae of the form p op c and UP op c from a QDDC formula D and find an 
equivalent formula without these. Such formulae are said to belong to subset 
QDDCR. Next we embed QDDCR into monadic logic over finite words. Thus 
for every formula E QDDCR we construct a formula of monadic logic 
over finite words which has the same set of models as Db This embedding was 
first presented by Pandya [20]. Finally, the famous theorem due to Buchi [3] and 
Elgot [7] states that for every formula ip of monadic logic over finite words, we 
can construct a finite state automaton which accepts exactly the word models 
of By combining these steps, we can obtain the automaton A{D) accepting 
word models of QDDC formula D. □ 

Corollary 1. Satisfiability (validity) of QDDC formulae is decidable. 

Proof outline For checking satisfiability of D G QDDC we can construct the 
automaton A{D). A word satisfying the formula can be found by searching for 
an accepting path within A{D). Such a search can be carried out in time linear 
in the size (number of nodes + edges) of A{D) by depth-first search. □ 



Example 2. The property of Example 1 can be stated in QDDC as formula 
[P]^ ^ [[-iQ] ^ [Q]^- The automaton corresponding this formula is given be- 
low. Each edge is labelled with a column vector giving truth values of vari- 
ables P,Q as in Example 1. Also, letter A is used to denote either 0 or 1. 

X 

0 1 X 
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DCVALID The reduction from formulae of QDDC to finite state automata as 
outlined in Theorem 1 has been implemented into a tool called DCVALID [21], 
which also checks for the validity of formulae as in corollary 1. This tool is built 
on top of MONA [11,13]. MONA is a sophisticated and efficient BDD-based 
implementation of the automata-theoretic decision procedure for monadic logic 
over finite words [3,7]. DCVALID works by reducing QDDC formulae to this 
logic [22]. The automaton in Example 2 was automatically generated from the 
formula by this tool. 

Complexity It must be noted that there is a non-elementary lower bound on the 
size of the automaton A{D) accepting word models of a QDDC formula D. In 
the worst case, the complexity of the output automaton can increase by one 
exponent for each alternation of -i and ^ operators. However, such blowup is 
rarely observed in practice and we have been able to check validity of many 
formulae which are 5-6 pages long with our tool DCVALID [21,22]. 

3 Logic CTL*[DC] 

A transition system (also called Kripke structure) is a quadruple [S, R, L, So) 
where 

S is the set of states, 

No C S' is the set of initial states. 

R C S X S is the transition relation 
L : S ^ V AL{Pvar) is the labelling function. 

Recall that V AL{Pvar) = Pvar {0, 1} is the set of valuations over Pvar. 
The labelling function L{s) gives the truth- value of propositional variables at 
state s. 

Syntax We have three sorts of formulae: QDDC formulae, path formulae and 
state formulae. Let P range over propositions. Let D range over QDDC formu- 
lae; a, j3 range over path formulae and </>, range over state formulae. 

Let L[s) 1= P denote that proposition P evaluates to true in state 5 with 
labelling function L. Also, for a given nonempty sequence of states (sq, . . . , 

let T((5 o, . . . , Sn)) (T(5o), . . . , L[sn)) give the corresponding sequence of val- 
uations in V AL{Pvar)^ . Hence for QDDC formula D over Pvar, we can define 
L{{sq, . . . , Sn)) 1= D as in Section 2. 

State Formulae of CTL^[DC] 

P \ D \ Aa I Eci I -i(/> I (f) Alp 
Path Formulae of CTL^[DC] 

<p \ a U l3 \ Xo I o A /? I -lo 



We can define some abbreviations for the path formulae. Let Fo = true U a 
and Go -iF-io. 




566 Paritosh K. Pandya 



Given M = {S, R, L, Sq) and s E S, let Tr{M,s) denote the (unique) tree 
obtained by unfolding the state graph M starting with state s. In this tree, 
each node is labelled by a state from s. Moreover, a node n labelled s has as 
its immediate successors nodes labelled with s' for each distinct s' such that 
R[s,s'). We call such a tree a computation tree. Let St[T,n) denote the state 
labelling the node n of tree T. 

Given a computation tree T and an internal node n, let hist{n) denote the 
finite sequence of states sq, . . . , labelling the nodes on the unique path from 
the root to n. A trajectory from no is an infinite sequence of nodes no, ni, . . . 
going into the future. Let paths{n) be the set of all trajectories starting from n. 

It should be noted that the label 5 of a node uniquely defines the subtree 
under it. However, distinct nodes n\ and U 2 with same state label will have 
distinct hist{n). The truth of formulae in our logic CTL*[DC] will depend upon 
both the subtree at n as well as hist{n). 

We now define the truth of state and path formulae. Let T — Tr(M, s) be a 
computation tree. Let n be a node of T and let p — no, ni, ... be a trajectory 

in T. Then, the truth of state formula T, n |= </>, and the truth of path formula 

T, p 1= a are defined as follows. 

State formulae: 

T,n^T iff L{St{T,n))^P 
T, n 1= iff L{hist{n)) |= 

T, n 1= Eci iff T, p 1= O' for some p G paths{n) 

T, n 1= Aa iff T, p |= a for all p G paths{n) 

The boolean combinators have their usual meaning. 

Path formulae: Let p = no, ni, . . . denote a trajectory in T starting at a (not nec- 
essarily root) node no- For any m G Nat, let p^ denote the suffix Um, • • • 

of p starting with node Um- 

T,p^0 iff T,no 

T, p ^ Xa iff T, pi ^ a 

T, p a U j3 iff for some m G Nat, 

T, 1= /?, and T, fp |= ci, for j : 0 < j < m 



Finally, 

T 1= (/) iff T,Ur <t> where is the root of the tree T 
M,s\^(j) iff Tr{M, s)^(l) 

M 1= (/) iff M, s \= (j) for all s ^ Sq 

Subset CTL[DC] In this subset every temporal operator X, U , G, F is preceded 
by a path quantifier A, E. If path formulae are restricted to the following syntax, 
we have the subset CTL[DC] 



(f) U ip I X(/) I ¥(f) I G(f) where (p, ip are state formulae 
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Subset LTL[DC] In this subset the formula is of the form Aa where the path 
formula a is free of path quantifiers A, E. If path formulae are restricted to the 
following syntax, we have the subset LTL[DC] 

P\D\aKl3\Xa\aAl3\^a 

Subsets CTL*, CTL and LTL If a formula of CTL*[DC] does not contain any 
QDDC formula, then it is called CTL* formula. Similarly, we can define CTL 
and LTL formulae. 

Example 3. The following CTL [DC] formula states that on all nodes of the com- 
putation tree which are at even distance from the root, proposition P must be 
true. Nothing is said about the truth of P on nodes which are at odd distance 
from the root. 

AG((?7 = 2)* ^ P) 

This property cannot be expressed in logic CTL*. Thus our extension increases 
the expressive power of logic CTL*. 

4 Decidability of Model Checking CTL*[DC] 

Given a transition system M and CTL* [DC] formula </>, we construct a trans- 
formed transition system M' and CTL* formula (j)^ such that 

M 1= iff 1= (f)' (1) 

Thus, we reduce the model checking of CTL* [DC] to model checking of CTL*. 
In the rest of this section, we will define this transformation and prove its cor- 
rectness. 

Let the transition system M = [S, R, L, Sq) and the CTL*[DC] formula be 
. . . , Dn), where Di, . . . , are the syntactically distinct QDDC formu- 
lae occurring within </>. We construct the transformed transition system as 
follows. 

Let A{Di) be the automaton recognising models of Di as in Theorem 1. Such 
an automaton is called synchronous observer for Di. We assume that A{Di) is in 
total and deterministic form. By this we mean that from any state q and for any 
valuation v G V AL{pvar{D)) there is a unique transition leading to the state 
given by a total function SiQp i;). 

We define the synchronous product of M with the list of automata A{Di). 
Since each A{Di) is a finite-state acceptor and M is a Moore machine, we define 
the product such that we get a Moore machine. Let A{Di) = {Qi^Sip^^ ^ Fi). If 
M starts in state s ^ So, the observers A{Di) observe this state and go to state 
= Si{q^ , L{s)) respectively. Also, if M moves from state s -A s', each A{Di) 
moves from state qi -A Si{qi, L{s')). The observable propositions of the resulting 
system are valuations over PvarU {Endi \ 1 < i < n}. Proposition Endi holds 
when automaton A{Di) is in its final state. Thus, Endi holds precisely when the 
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behaviour from start up to the current node satishes the formula Di . Formally, 
let 



M' = {S', R',L',S'o) 



where 



S" = S' X Qi X . . . X Q„ 

S'o =b(s.9i.92. • • -lO I seSo andq? =Si{qf,L{s))} 

R' = {{{s,qi,q 2 ,---,qn),{s',q'i,q' 2 ,---,q'n)) \ R{s,s') Aq'i = Si{qi,L{s'))} 

V{{s, ^ 1 , . . . , qn)) = L{s) U {ENDi ^ {qi G Fi)} 

The transformed formula <^{Endi, . . . , Endn) is obtained by replacing each oc- 
currence of QDDC sub-formula Di by a proposition Endi which witnesses the 
truth of Di . 

Theorem 2. Let 5 = (s, g'f , . . . , q^) where qf = Si{qf , L{sj), Then^ 

M, s \= <f){Di, . . . , Dn) iff M\ s \= <f){Endi, . . . , Endn) 

Proof Outline Consider the computation tree T = Tr(M, s) in M and corre- 
sponding computation tree — Tr(M^, 5 ) in Mb There is a bijection tt : T ^ 

between the nodes of T and M as follows. For every node k G Tr(M, s) with 

state label M(T, k) — Sk^ we have a node 7i{k) with label 

{sk,5i{q°i,L{hist{k ))), . . . , L{hist{k)))) . _ 

(Here, we have extended the transition function 6i over V AL to 6i over V AL^). 
From the above bijection, it is easy to prove that, 

T,k^P iff r,7T{k) ^ P. 

The central property of D is that 

T, 1= Di iff M, 7r{k) |= Endi. 

From these, by structural induction on </>, we can prove that 

k (j){Di^ . . . ^ Dn) iff T' ^7r{k) (j){Endi^ . . . ^ Endn). □ 

Corollary 2. M <^{Di, . . . , D^) iff M' <^{Endi, . . . , End^) □ 

Note that, if M is finite state then M' is also a finite state Kripke structure. More- 
over, (f){Endi^ . . . , Endn) is a pure CTL* formula which can be model checked 
as follows. 

Theorem 3 (Emerson and Lei [8]). For a finite-state Kripke- structure 
and CTL* formula , there exists an algorithm to decide whether |= <f)L 



Corollary 3. M |= <f){Di, . . . , Dn) is decidable if M is finite-state. 



□ 
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5 CTLDC: A Tool for Model Checking CTL[DC] 

We have implemented the reduction outlined in Corollary 2 into a tool called 
CTLDC. The tool is constructed on top of QDDC validity checker DCVALID 
[21,22]. The tool allows CTL[DC] specifications of SMV, Verilog and Esterel 
designs to be model checked in conjunction with verifiers SMV[18], VIS[2] and 
Xeve [1], respectively. A separate report gives the details of usage and working 
of this tool [23]. 

Given an SMV module M and a formula <p{Di, . . . , ) G CT L [DC ] , our tool 

CTLDC gives the transformed SMV modules corresponding to of Theorem 
2 and also the formula <f){Endi, . . . , Endn)- We can then use SMV [18] to model 
check whether |= <f){Endi, . . .,Endn)- The tool works in a similar fashion 
for Verilog and Esterel designs. The reader may refer to [23] for details of these 
transformations. 

5.1 An Example: Vending Machine 

A vending machine accepts bp and lOp coins. A chocolate costs Ibp. We model 
the working of such a machine by the following SMV module. 

MODULE vend 
VAR 

bal:{0,5,10,15}; 
event: {p5,pl0,choc, null}; 

IMIT 

bal = 0 & ! (event=choc) 

TRAMS 

next (bal) = case 

bal <= 10 & event=p5 
bal <=5 & event=pl0 

bal = 15 & event=choc 
event=null : bal; 
esac 

The following QDDC formula holds for all behaviour fragments satisfying the 
condition that 15p worth of coins have been deposited and no chocolate has been 
obtained. 

fifteenp {[[ event ^ choc]] A 

{E{event — bp) = 3 V {E{event = bp) = 1 A E{event — lOp) = 1) )) 

Then, a possible extension of any behaviour ending with fifteenp is that a 
chocolate is obtained next. 

AG {true^ fifteenp ^ EX(ecent = choc)) 

Moreover, the only possible extensions are that a null event can occur or a 
chocolate can be obtained. 

AG {true^ fifteenp ^ AX(ecent = choc V event = null)) 



bal+5 ; 
bal+10 ; 
0 ; 
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We consider the vending machine behaviours under a fairness condition which 
states that infinitely often non-null events are performed, i.e. {event ^ null) 
holds. The following specification holds for all fair behaviours of vending ma- 
chine. Here the path quantifier ranges over all fair behaviours. 

A'^GA'^F {{jijieen'p^ un%t^ \eveni — choc\^Y) 

The above three properties were checked using the tool CTLDC. In checking the 
last property, we made use of the fair CTL model checking abilities of SMV. 

Synchronous Bus Arbiter In a more substantial verification using CTLDC, we 
checked some properties of the historic synchronous bus arbiter as modelled in 
SMV by McMillan [18]. A synchronous bus arbiter with n cells has request lines 
reqi and acknowledgement lines acki for 1 < i < n. At any clock cycle a subset 
of the request lines are high. It is the task of the arbiter to set at most one of 
the corresponding acknowledgement lines high. Preferably, the arbiter should be 
fair to all requests. We refer the readers to McMillan’s book [18] (Section 3.4.1) 
for a detailed description of a specific synchronous arbiter circuit. Here, we are 
mainly interested in its properties. 

The following property states that if reqi is held high for any interval of m 
cycles then there must be an acki during such an interval. 

AG □(([[reg'j] A (?7 = m) ^ true \ackiY ext) 

For an n = 5 cell arbiter, we found using CTLDC that the property holds for the 
first cell for m — n. But for all other cells it, does not hold if m < 2n. For these 
cells, the property does holds for m = 2n. Hence we concluded that the first cell 
is guaranteed access to bus if its request is held high for n cycles whereas for all 
other cells, the request must be held high for 2n cycles to guarantee access. 

The following property asserts that the arbiter will not service a request reqj 
first if an earlier request reqi is still pending (the so called “first come first serve” 
policy (see [27])). 

Let fifo{i,j) be defined as 

AG \J^{\^reqjY ^'\\reqi A->acki]\ => ^0\ackjY) 

Surprisingly, McMillan’s bus arbiter with 5-cells satisfies fifo{i,j) for the fol- 
lowing pairs and for no other pairs. This was determined experimentally using 
CTLDC. 

(1,2), (1,3), (1,4), (1,5), (2,3), (3,4), (4,5) 

A much more comprehensive analysis of the performance of McMillan’s arbiter 
circuit, and its variants, can be found on the DCVALID web page [21]. 

6 Conclusions 

In this paper, we have proposed an extension of the logic CTL* to CTL* [DC]. 
This extension allows specification of past-time properties in CTL* using formu- 
lae of Quantified Discrete-time Duration Calculus (QDDC). In our opinion, this 
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simple extension considerably facilitates formalisation of complex reqnirements. 
CTL*[DC] is especially nsefnl for expressing past-time reqnirements and qnan- 
titative timing constraints. In a separate note, we give many snch examples [24]. 
The properties in previons section are also illnstrative. Formally, the expressive 
power of the logic CTL* is increased as shown by Example 3. CTL*[DC] can 
also be considered as a significant extension of QDDC which allows liveness and 
branching properties to be stated. 

We have shown a redaction of CTL*[DC] model checking problem to CTL* 
model checking problem. The rednction relies npon the antomata theoretic deci- 
sion procednre for QDDC. We believe that this approach is practically relevant 
as a nnmber of tools exist for CTL* and its snbsets, CTL and LTL. We have 
implementated this rednction into a tool called CTLDC which permits model 
checking CTL[DC] specifications of finite-state transition systems. 

The tool CTLDC can model check SMV, Verilog and Esterel designs by 
redncing the model checking to a form which can be checked by SMV, VIS and 
Xeve tools respectively. In this sense^ CTLDC is not a new model checker. It 
enhances the functionality of SMV [18]^ VIS [2] and Xeve [1] by adding ability 
to model check a much more richer logic CTL[DC], It enables complex properties 
involving past and guantitative timing to be checked using existing checkers. A 
separate report gives details of implementation [23]. Cnrrently, CTLDC is one 
of the very few available tool for model checking CTL with past and timing. In 
context of Dnration Calcnli, CTLDC is the only tool allowing model checking. 
(It shonld be noted that onr original tool DC VALID [21] only checked for the 
validity of QDDC formnlae.) 

The symbolic model checking algorithm for CTL has been extended to fair 
CTL model checking [18]. It is easy to see that onr rednction of CTL* [DC] model 
checking to CTL* model checking by transforming the transition system M to M’ 
(Theorem 2) preserves fair paths. Hence, onr rednction also gives a method for 
CTL [DC] model checking nnder fairness constraints by rednction to Fair CTL. 

An important aspect of model checking is error trace generation. In onr re- 
dnction, an error trace of the transformed model M^, in fact, gives an error trace 
for the original model M if we disregard (project ont) the extra variables which 
have been added by the transformation. Hence, existing facilities of connter ex- 
ample generation in rednced model can be nsed for CTL [DC]. 

Onr approach of combining QDDC with CTL* can eqnally be nsed with any 
other logic, say X, which specifies properties of finite state seqnences. Moreover, 
if the logic permits an antomata theoretic decision procednre, this can be nsed 
to rednce the model checking problem for CTL*[X] to CTL* by nsing exactly the 
same transformation proposed here. One conld consider a form of LTL over finite 
seqnence, or monadic logic over finite words which both have antomata theoretic 
decision procednres. Or one conld nse a form of regnlar expressions. Hence, the 
approach presented here is qnite generic. However, the expressive power (in a 
pragmatic sense) and facilities for qnantitative timing constraint specifications 
which are fonnd in QDDC may not be so easily available in all snch logics. 
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The issue of complexity of CTL*[DC] merits discussion and further investi- 
gation. As stated in Section 2, even the subset QDDC of CTL*[DC] has a non- 
elementary lower bound on the complexity of validity checking [22]. The same 
lower bound carries over to model checking of CTL*[DC] formulae. Such high 
complexity can potentially be a source of in-feasibility and may sound hopeless. 
However, this complexity is rarely seen in practice. In fact, we have been able to 
check many formulae which are 5-6 pages long with our tool (see Pandya [21,22] 
for substantial examples and performance measurements). However, we have also 
encountered a few pathological formulae leading to state space explosion. 

It has been long recognised [17] that availability of past time modalities in 
temporal logics can considerably facilitate formulation of complex properties. 
There have been several formulations extending LTL and CTL with past. Their 
model checking problem has also been investigated [14,15]. Extensions of CTL 
such as RTCTL [9] allow quantitative timing properties to be expressed and to 
be model checked using tools such as NuSMV [6]. A precise comparison of the 
formal expressive power of our logics CTL [DC] and CTL* [DC] with these logics 
is currently under investigation. We conjecture that CTL [DC] is strictly more 
expressive than the logic CTLip proposed by Kupferman and Pnueli [14]. 
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Abstract. CPN/Tools is a major redesign of the popular Design/CPN 
tool for editing, simulation and state space analysis of Coloured Petri 
Nets. The new interface is based on advanced interaction techniques, 
including bi- manual interaction, toolglasses and marking menus and a 
new metaphor for managing the workspace. It challenges traditional ideas 
about user interfaces, getting rid of pull-down menus, scrollbars, and even 
selection, while providing the same or greater functionality. CPN/Tools 
requires an OpenGL graphics accelerator and will run on all major plat- 
forms (Windows, Unix/Linux, MacOS). 



1 The CPN/Tools Interface 

Interaction techniqnes for desktop v^orkstations have changed little since the 
creation of the Xerox Star in the early eighties. The vast majority of todayN 
interfaces are still based on a single monse and keyboard to manipnlate windows, 
icons, menns, dialog boxes, and to drag and drop objects on the screen. While 
these interfaces are now nbiqnitons they are also reaching their limits: as new 
applications become more powerfnl, the corresponding interfaces become too 
complex. CPN/Tools [1] addresses this trade-off between power and ease-of-use 
by combining new interaction techniqnes into a consistent and simple interface 
for editing and simnlating Colonred Petri Nets [2,3]. 

Colonred Petri Nets freqnently contain a large nnmber of pages, which are 
similar to modnles in programming langnages. In CPN/Tools we have designed a 
window manager that makes it easy to manage these modnles. The workspace oc- 
cnpies the whole screen and contains window-like objects called binders. Binders 
contain pages ^ each eqnivalent to a window in a traditional environment. Each 
page has a tab similar to those fonnd in tabbed dialogs. Clicking the tab brings 
that page to the front of the binder. A page can be dragged to a different binder 
or to the backgronnd to create a new binder for it. Binders rednce the nnmber 
of windows on the screen and the time spent organizing them. Binders also help 
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users organize their work by grouping related pages together and reducing the 
time spent looking for hidden windows. 

CPN/Tools supports multiple views, allowing several binders to contain a 
representation of the same page. For example one binder can contain a view on 
a page including simulation information while another binder can contain a view 
on the same page without simulation information and at a smaller scale (Fig.l). 

The CPN/Tools interface requires a keyboard and two pointing devices. For 
a right-handed user we use a mouse for the right hand and a trackball for the 
left hand. The mouse is used for tasks that may require precision, while the 
trackball is used for tasks that do not require much precision e.g. moving tools. 
For simplicity we assume a right-handed user in our description of interaction 
techniques. 

The interface has no menu bars, no pull-down menus, no scrollbars and no 
dialog boxes. Instead, it uses a unique combination of traditional, recent and 
novel interaction techniques: 

Direct manipulation (i.e. clicking or dragging objects) is used for frequent 
operations such as moving objects, panning the content of a view and editing 
text. When a tool is held in the right hand, e.g. after having selected it in a 
floating palette, direct manupulation actions are still available via a long click, 
i.e. pressing the mouse button, waiting for a short delay until the cursor changes, 
and then either dragging or releasing the mouse button. 

Bi-manual manipulation is a variant of direct manipulation that involves 
using both hands for a single task. It is used to resize objects (binders, places, 
transitions, etc.) and to zoom the view of a page. The interaction is similar to 
holding an object with two hands and stretching or shrinking it. Unlike tradi- 
tional window management techniques, using two hands makes it possible to 
simultaneously resize and move a binder, or pan and zoom the view of a page. 

Marking Menus [5] are radial, contextual menus that appear when click- 
ing the right button of the mouse. Marking menus offer faster selection than 
traditional linear menus for two reasons. First, it is easier for the human hand 
to move the cursor in a given direction than to reach for a target at a given 
distance. Second, the menu does not appear when the selection gesture is ex- 
ecuted quickly, which supports a smooth transition between novice and expert 
use. Kurtenbach and Buxton [5] have shown that selection times can be more 
than three times faster than with traditional menus. 

Keyboard input is used only to edit text. Some navigation commands are 
available at the keyboard to make it easier to edit several inscriptions in sequence 
without having to move the hands to the pointing devices. Keyboard modifiers 
and shortcuts are not necessary since most of the interaction is carried out with 
the two hands on the pointing devices. 

Floating palettes contain tools represented by buttons. Clicking a tool 
with the mouse activates this tool, i.e. the user conceptually holds the tool in the 
hand. Clicking on an object with the tool in hand applies the tool to that object. 
Floating palettes are moved with the left hand, making it easy to bring the tools 
close to the objects being manipulated, and saving the time spent moving the 
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Fig. 1. The CPN/Tools interface. The left column is called the index. The top- 
left and top-right binders contain different views of the same page. The views 
are scaled differently and the view in the top-left binder contains simulation 
information. The bottom binder contains six pages represented by tabs: The 
page in front shows several magnetic guidelines (dashed lines). In the top right 
binder a circular marking menu has been popped up on a page. The palette 
with the VCR-like controls is a floating palette, while a toolglass is positioned 
over the page in the bottom binder. This toolglass can be used to edit colours, 
linetypes and linethicknesses. 



cursor to a traditional menubar or toolbar. Floating palettes can be dropped in 
the workspace and become a standard toolpalette. In many current interfaces, 
after a tool is used (especially a creation tool), the system automatically activates 
a “selecF’ tool. This supports a frequent pattern of use in which the user wants 
to move an object immediately after it has been created but causes problems 
when the user wants to create additional objects of the same type. CPN/Tools 
avoids this automatic changing of the current tool by ensuring that the user can 
always move an object, even when a tool is active, with a long click of the mouse. 
This mimics the situation in which one continues holding a physical pen while 
moving an object out of the way in order to write. 
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Toolglasses[4] like floating palettes, contain a set of tools represented by 
bnttons, and are moved with the left hand, bnt nnlike floating palettes, they 
are semi-transparent. A tool is applied to an object with a click-through action: 
The tool is positioned over the object of interest and the nser clicks throngh 
the tool onto the object. The toolglass disappears when the tool reqnires a drag 
interaction, e.g. when creating an arc. This prevents the toolglass from getting 
in the way and makes it easier to pan the docnment with the left hand when 
the target position is not visible. This is a case where the two hands operate 
simnltaneonsly bnt independently. 

Magnetic guidelines are nsed to align objects and keep them aligned. Mov- 
ing an object near a gnidelinet canses the object to snap to it. Objects can be 
removed from a gnideline by dragging them away from it. Moving a gnideline 
moves all the objects attached to it, maintaining their alignment. 

Preliminary resnlts from onr nser stndies make it clear that none of the 
above techniqnes is always better or worse. Rather, each emphasizes a different, 
bnt common pattern of nse. Marking menus work well when applying multiple 
commands to a single object. Floating palettes work well when applying the same 
command to different objects. Toolglasses work well when the work is driven by 
the structure of the diagram, such as working around a cycle in a Petri net. 

2 Conclusion 

CPN/Tools combine advanced interaction techniques into a consistent interface 
for editing and simulating Coloured Petri Nets. These interaction techniques 
have proved to be very efficient when working with Coloured Petri Nets, and we 
believe they will be suitable for graph editing and layout in general. Information 
about CPN/Tools including a more detailed tool-presentation can be found at 
http : //www . daimi . au . dk/CPnets/CPN2000. 
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Abstract. Gurevich’s Abstract State Machines (ASMs) constitute a 
high-level state-based modelling language, which has been used in a wide 
range of applications. The ASM Workbench is a comprehensive tool en- 
vironment supporting the development and computer-aided analysis and 
validation of ASM models. It is based on a typed version of the ASM 
language, called ASM-SL, and includes features for type- checking, sim- 
ulation, debugging, and verihcation of ASM models. 



1 Introduction 

The ASM Workbench is a comprehensive tool environment supporting the devel- 
opment and computer-aided analysis and validation of Abstract State Machine 
models. Abstract State Machines (ASMs), defined by Yuri Gurevich in [4], are 
an effective approach for specifying and modelling state-based systems, v^hich 
combines transition systems, used for modelling the dynamic aspects of a sys- 
tem, i.e., its behaviour, v^ith first-order structures (algebras), used to model the 
static aspects, e.g., data types. 

The ASM Workbench is based on a language called ASM-SL (ASM-based 
Specification Language), which extends the ASM language as defined in [4] by 
a type system and by constructs to define data types and functions (such ex- 
tensions are very convenient in order to provide tool support for ASMs). The 
Workbench itself consists of a kernel, providing basic support for ASM tool de- 
velopment, and a set of tools built upon this kernel, which include a type-checker, 
a simulator, a debugger-like GUI, and a model-checker interface for formal veri- 
fication support. 

In this abstract, after recalling the basic ideas of Abstract State Machines 
(Sect. 2), we give an overview of the ASM-SL language (Sect. 3) and of the 
architecture of the ASM Workbench and of the tools it includes (Sect. 4). A 
complete account of ASM-SL and of the ASM Workbench, as well as of the 
underlying concepts and techniques, can be found in [1]. 
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2 Gurevich’s Abstract State Machines 

Abstract State Machines [ASMs), introduced by Yuri Gurevich in [4], are a 
high-level state-based language for modelling discrete dynamic systems, which 
has been used in a wide range of applications, such as specifications of hardware 
and software architectures and operational semantics of programming languages 
(see [5] for a comprehensive overview of applications of ASMs). 

The underlying computational model is essentially the well-known model of 
transition systems. Computations (runs) are finite or infinite sequences of states 
{s*}, obtained from the initial state sq by repeatedly executing transitions Si'. 

Sq ) Sx ) ^2 • • • 9 • • • 

In the simple case of deterministic ASMs without any communication with an 
external environment, there will be exactly one run. Otherwise, the set of possible 
runs can be represented, as usual, by means of a set 5o C 5 of initial states and 
a transition relation R C S x S. 

The peculiarity of ASMs is that states are first-order structures (algebras) 
over a given vocabulary T. In the traditional definition of transition systems, 
states are identified by the value of a finite number of state variables. In ASMs, 
instead, states are identified by the interpretation of function names from T, 
which are classified as static, dynamic, and external. Each transition may change 
the interpretation of dynamic function names in a finite number of places.^ 
External function names are used to model the environment (their interpretation 
may change from state to state depending on the environment behaviour, like 
inputs of finite state machines), while static function names never change their 
interpretation (they typically correspond to operations on some data types). 

A language of transition rules is defined in [4], which allows to specify ASM 
transitions. The most essential transition rule is the so-called update rule, of the 
form “/(G , ... ,tn) '.— E, where / is a n-ary dynamic function name, and ti , t are 
terms over the vocabulary T. This rule has the effect of changing the interpre- 
tation of / such that fs,^i(sf(G), • • -^Si{in)) = More complex transition 

rules can be built by means of additional rule constructors, such as conditionals, 
parallel composition, non-deterministic choice, etc. 



3 The ASM-SL Notation 

For the purpose of equipping the ASM method with tool support, it is necessary 
to extend the basic ASM language of [4]. In particular, the ASM definition 
does not indicate how to define universes and (static) functions, i.e., the data 
model underlying the transition system. Clearly, there are several options for 
specifying data, e.g., axiomatic descriptions in the style of algebraic specification. 
However, in order to obtain executable specifications, a model-based approach 



^ In this sense, ASMs constitute a generalization of transition systems (which can be 
considered as a special case of ASMs, where all dynamic function names are 0-ary). 
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Fig. 1. The ASM Workbench Tool Environment 



was adopted, in the style of VDM [6]. A type system was also added (originally, 
ASMs are nntyped). The resnlt is ASM-SL^, the sonrce langnage of the ASM 
Workbench, which extends the ASM langnage by some constrncts borrowed from 
ML and VDM. The main featnres of ASM-SL can be snmmarized as follows: 

— Specification of behavionr based on the ASM langnage of transition rnles [4]. 

— Polymorphic type system based on the type system of Standard ML [8]. 

— Model-based approach to data specification, inclnding: predefined elemen- 
tary types (booleans, integers, strings) and type constrnctors (tnples, lists, 
finite sets, finite maps), nser-definable free types, comprehension notation, 
pattern matching, recnrsive and mntnally recnrsive function dehnitions. 

4 The ASM Workbench Tools 

The ASM Workbench consists of a kernel, a set of modules implemented in 
the functional language Standard ML [8], which provide basic functionalities 
(such as parser, type-checker, pretty-printer, and an interpreter-based evalua- 
tor), and a few additional components (Graphical User Interface, model-checker 
interface), built on the top of the kernel. Figure 1 shows the rough architecture 
of the Workbench, which includes the mentioned tools and could be easily ex- 
tended by additional components (e.g., code generators), by reusing the kernel 

^ ASM-based Specihcation Language. 
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functionalities. For reasons of space, it is not possible to go into details of the 
tool architecture. Instead, we give an overview of the existing tools. 

The type-checker for ASM-SL is based on an efficient implementation of the 
well-known unification-based type inference algorithm [2]. In addition to type- 
checking, it performs other simple static checks. It can be used as a standalone 
tool or as a preprocessor for further elaborations. 

The interpreter allows to simulate ASM runs, while keeping track of the com- 
putation history. In this way, computation steps can also be retracted [backward 
step feature). It also possible to simulate ASM models which interact with the 
environment by means of external functions. This can be done by means of an 
oracle process, which communicates with the interpreter in order to provide it 
with the values of the external functions, whenever needed. 

A Graphical User Interface [GUI) allows to control the simulation and 
inspect its results, providing all the typical features of a debugger (browsing 
through the code, performing single steps forward or backward, setting break- 
points, observing the values of some terms, etc.).^ 

Finally, a model-checker interface provides support for formal verification of 
finite-state ASM models. Although ASM models have, in general, an infinite state 
space, the ASM-SL language provides a syntactic construct — so-called “finiteness 
constraints” — by which the finiteness of the model can be enforced by local 
modifications (restricting the ranges of dynamic and external functions to finite 
sets). Then, the ASM model is translated, by applying transformation techniques 
(unfolding and fiattening of transition rules), into a model amenable to model- 
checking. The actual verification is performed by the SMV model-checker [7]. 
The ASM model is translated into the SMV language and then checked against 
a set of CTL formulae, to be provided separately (for details, see [3,1]). 
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1 Introduction 

The functional programming language Erlang was developed by the Ericsson cor- 
poration to address the complexities of developing large-scale programs within 
a concurrent and distributed setting. It is successfully used in the design and 
implementation of telecommunication systems. 

Software written for this application domain usually has to meet high qual- 
ity demands such as correctness. Due to the high degree of concurrency and to 
the dynamic behaviour of systems, testing is generally not sufficient to guaran- 
tee these properties to a satisfactory degree. We therefore follow a verification 
approach, i.e., we employ formal methods to prove that a telecommunication sys- 
tem implemented in Erlang has certain properties specified in a suitable logic. 

In view of the complexity of the verification problem in general it is manda- 
tory to provide the user with powerful tool support. Therefore, in 1997 the 
development of the EVT Erlang Verification Tool was started at the Swedish 
Institute of Computer Science in collaboration with and with financial support 
from the Ericsson Computer Science Lab and the Swedish ASTEC (Advanced 
Software TEChnology) competence centre. 

To cope with the challenges of software verification in a highly dynamic set- 
ting, a semi-automatic theorem-proving approach was chosen as the underlying 
framework. EVT is a proof assistant which offers powerful induction techniques 
to handle dynamic process creation and unbounded data structures. Its graph- 
ical user interface provides comfortable access to proof resources. Several case 
studies such as a billing agent [3] and a distributed database lookup manager [2] 
have demonstrated the usefulness of the tool. 

2 Foundations 

2.1 The Erlang Programming Language 

Erlang [1] is a concurrent functional programming language which allows to 
implement dynamic networks of processes operating on data types such as inte- 
gers, lists, tuples, or process identifiers (pids), using asynchronous, call-by- value 
communication via unbounded ordered message queues called mailboxes. 

The following code fragment specifies a simple concurrent server which re- 
peatedly accepts an incoming query in form of a triple which is tagged by the 

*** Most of the work was done during the author’s employment at the Department of 
Teleinformatics, Royal Institute of Technology (KTH), Stockholm, Sweden. 
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request constant, and which contains the request itself (matched by the variable 
Request) and the pid of the client process (Client). It then spawns off a pro- 
cess to serve the request (by using the handle function which is not considered 
here), and sends the result back to the client as a tuple tagged by the response 
constant. 

server () -> 
receive 

{request, Request, Client} -> 

spawn (serve, [Request, Client]), 
server () 

end . 

serve (Request, Client) -> 

Client ! {response, handle (Request)} . 

The starting point of any kind of rigorous verification is a formal semantics. 
Here we use an operational semantics (a variant of the semantics presented 
in [3]) by associating a transition system with an Erlang program, giving a 
precise account of its possible computations. The states are parallel products 
of processes, each of the form (e,p, ^), where e is an Erlang expression to be 
evaluated in the context of a unique pid p and a mailbox q for incoming messages. 
A set of rules is provided to derive labelled transitions between such states. 

This semantics is embedded in the proof system by allowing transition as- 
sertions to be used as atomic propositions, and by considering the modalities as 
derived formulae that are defined in terms of the transition relation (analogously 
to the treatment of CCS in [5,4]). 



2.2 The Property Specification Logic 

The logic that we use to capture the desired behaviour of Erlang programs 
and their components combines both temporal and functional features. It can 
be characterized as a many-sorted first-order predicate logic, extended with 
temporal modalities, least and greatest fixed-point operators, and some Erlang- 
specific atomic predicates. Due to its generality, it can be used to express a 
wide range of important system properties, ranging from more static, type-like 
assertions to complex safety and liveness features of the interaction between 
processes. An example for the latter is the following property of our concurrent 
server. It expresses that the process stabilizes on internal (V^) and output (4^) 
actions, that is, only a finite number of these can occur consecutively: 

stabilizes = \]Z. 



2.3 The Proof System 

At the very heart of our approach is a tableau-based proof system [3,4] embody- 
ing the basic proof rules by which complex correctness assertions can be reduced 
to (hopefully) less complex ones. It operates on Gentzen-style sequents of the 
form r \- A where F and A are sets of assertions representing the premises and 
the conclusions, respectively, of the proof. Eor example, the following sequent 
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expresses that the server process has the stabilizes property provided that 
the same holds, given any reqnest r, for the handle fnnction: 

Vg'^Vr.(handle(r),p, : stabilizes h Vg'.(server(), p, g') : stabilizes 

Snmmarizing, onr proof system can be characterized by the following at- 
tribntes: 

— laziness: only those parts of the state space and of the formnla are taken 
into acconnt which are needed to establish the desired property, and so- 
called metavariables are nsed to postpone the choice of witnesses in a proof 
rnle. 

— pammetricity: by representing parts of the system by placeholder variables 
(parameters), a relativised reasoning style for open systems is snpported. 

— compositionality: nsing a TnG rnle, the end-system reqnirements can be 
decomposed in a modnlar and iterative way into properties to hold of the 
component processes. 

— induction: to snpport reasoning abont nnbonnded (or even non-well-fonnded) 
strnctnres, indnctive and co-inductive reasoning is provided based on the ex- 
plicit approximation of fixed points and on well-fonnded ordinal indnction. 
Indnctive assertions are discovered dnring the proof rather than being en- 
forced right from its beginning. 

— reuse: by sharing snbproofs and nsing lemmata, already established proper- 
ties can be rensed in different settings. 



3 The EVT Tool 

The verification task in onr setting amonnts to the goal-directed constrnction 
of a proof tree. Starting from the root with a goal seqnent like the stabilization 
property of the server, the proof tree is bnilt np by snccessively applying tactics 
nntil every leaf is labeled by an axiom. These tactics can attempt to constrnct 
complete proofs, or they can retnrn partial proofs, stopping once in a while to 
prompt the nser for gnidance. The atomic tactics are given by the rnles of the 
proof system, and more complex ones can be bnilt np nsing special combina- 
tors called tacticals to obtain more powerfnl proof steps and, thns, to antomate 
proofs, and to raise the abstraction level of reasoning. 

Besides the actnal constrnction of the proof tree, EVT provides snpport 
for proof rense, navigation, and visnalization. Dne to the global character of 
the discharge rnle which checks whether the cnrrent node of the proof tree is 
labeled by an instance of a previons proof seqnent, the whole proof tree has to 
be maintained. EVT can thns be nnderstood as a proof tree editor, extended 
with facilities for semi-antomatic proof search. 

A screen snapshot of EVTN graphical nser interface is given in Eig. 1. It 
provides an easy and context-sensitive access to the proof resonrces like proof 
trees and their nodes, tactics, lemmata, and others. In the concrete sitnation, an 
example goal seqnent is shown (with the premises and the conclnsions above and 
below, respectively, the tnrnstile), and the nser is jnst abont to select a tactic 
which shonld be applied to the highlighted assertion. 

A binary version of EVT can be downloaded from 
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Fig. 1. The graphical user interface 



ftp : //ftp . sics . se/pub/fdt/evt/index . html 

It is available for Intel x86 and Sparc architectures running under Linux and 

Solaris, respectively. 
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